9.2. Brick Configuration

LVM layer

Creating the Physical Volume
The pvcreate command is used to create the physical volume. The Logical Volume Manager can use a portion of the physical volume for storing its metadata while the rest is used as the data portion.Align the I/O at the Logical Volume Manager (LVM) layer using -- dataalignment option while creating the physical volume.
The command is used in the following format
```
pvcreate --dataalignment alignment_value disk
```
```
pvcreate --dataalignment alignment_value disk
```
Copy to Clipboard Toggle word wrap
For JBOD, use an alignment value of 256K.
In case of hardware RAID, the alignment_value should be obtained by multiplying the RAID stripe unit size with the number of data disks. If 12 disks are used in a RAID 6 configuration, the number of data disks is 10; on the other hand, if 12 disks are used in a RAID 10 configuration, the number of data disks is 6.
For example:
- Run the following command for RAID 6 storage with 12 disks and a stripe unit size of 128KiB:
  # pvcreate --dataalignment 1280K disk
  Copy to Clipboard Toggle word wrap
- Run the following command for RAID 10 storage with 12 disks and a stripe unit size of 256KiB:
  # pvcreate --dataalignment 1536K disk
  Copy to Clipboard Toggle word wrap
- To view the previously configured physical volume settings for --dataalignment, run the following command :
  # pvs -o +pe_start disk PV VG Fmt Attr PSize PFree 1st PE /dev/sdb lvm2 a-- 9.09t 9.09t 1.25m
  Copy to Clipboard Toggle word wrap
Creating the Volume Group
The volume group is created using the vgcreate command. In order to ensure that logical volumes created in the volume group are aligned with the underlying hardware RAID, it is important to use the -- physicalextentsize option.
For JBOD, use the physical extent size of 256 K.
LVM currently supports only physical extent sizes that are a power of 2, whereas RAID full stripes are in general not a power of 2. Hence, getting proper alignment requires some extra work as outlined in this sub-section and in the sub-section on thin pool creation.
Since a RAID full stripe may not be a power of 2, use the RAID stripe unit size, which is a power of 2, as the physical extent size when creating the volume group.
Use the vgcreate command in the following format
```
vgcreate --physicalextentsize RAID_stripe_unit_size VOLGROUP physical_volume
```
```
# vgcreate --physicalextentsize RAID_stripe_unit_size VOLGROUP physical_volume
```
Copy to Clipboard Toggle word wrap
For example, run the following command for RAID-6 storage with a stripe unit size of 128K, and 12 disks (10 data disks):
```
vgcreate --physicalextentsize 128K VOLGROUP physical_volume
```
```
# vgcreate --physicalextentsize 128K VOLGROUP physical_volume
```
Copy to Clipboard Toggle word wrap
Creating the Thin Pool
A thin pool provides a common pool of storage for thin logical volumes (LVs) and their snapshot volumes, if any. It also maintains the metadata required to track the (dynamically) allocated regions of the thin LVs and snapshots. Internally, a thin pool consists of a separate data device and a metadata device.
To create a thin pool you must first create an LV to serve as the metadata device, then create a logical volume to serve as the data device and finally create a thin pool from the data LV and the metadata LV
Creating an LV to serve as the metadata device
The maximum possible size for a metadata LV is 16 GiB. Red Hat Storage recommends creating the metadata device of the maximum supported size. You can allocate less than the maximum if space is a concern, but in this case you should allocate a minimum of 0.5% of the data device size.
After choosing the size of the metadata device, adjust it to be a multiple of the RAID full stripe size to allow the LV to be aligned with the hardware RAID stripe. For JBOD, this adjustment is not necessary.
For example in the case where a 16GiB device is created with RAID 6 with 128K stripe unit size, and 12 disks (RAID full stripe is 1280KiB):
```
KB_PER_GB=1048576 
(( metadev_sz = 16 * $KB_PER_GB / 1280 ))
(( metadev_sz = $metadev_sz * 1280 ))
lvcreate -L ${metadev_sz}K --name metadata_device_name VOLGROUP
```
```
KB_PER_GB=1048576 
(( metadev_sz = 16 * $KB_PER_GB / 1280 ))
(( metadev_sz = $metadev_sz * 1280 ))
lvcreate -L ${metadev_sz}K --name metadata_device_name VOLGROUP
```
Copy to Clipboard Toggle word wrap
Creating an LV to serve as the data device
As in the case of the metadata device, adjust the data device size to be a multiple of the RAID full stripe size. For JBOD, this adjustment is not necessary.
For example, in the case where a 512GiB device is created with RAID 6 with 128KiB stripe unit size, and 12 disks (RAID full stripe is 1280KiB).
```
KB_PER_GB=1048576
(( datadev_sz = 512 * $KB_PER_GB / 1280 ))
(( datadev_sz = $datadev_sz * 1280 ))
lvcreate -L ${datadev_sz}K --name thin_pool VOLGROUP
```
```
KB_PER_GB=1048576
(( datadev_sz = 512 * $KB_PER_GB / 1280 ))
(( datadev_sz = $datadev_sz * 1280 ))
lvcreate -L ${datadev_sz}K --name thin_pool VOLGROUP
```
Copy to Clipboard Toggle word wrap
Creating a thin pool from the data LV and the metadata LV
An important parameter to be specified while creating a thin pool is the chunk size. For good performance, the chunk size for the thin pool and the parameters of the underlying hardware RAID storage should be chosen so that they work well together.
For RAID-6 storage, the striping parameters should be chosen so that the full stripe size (stripe_unit size * number of data disks) is between 1MiB and 2MiB, preferably in the low end of the range. The thin pool chunk size should be chosen to match the RAID 6 full stripe size. Matching the chunk size to the full stripe size aligns thin pool allocations with RAID 6 stripes, which can lead to better performance. Limiting the chunk size to below 2MiB helps reduce performance problems due to excessive copy-on-write when snapshots are used.
For example, for RAID 6 with 12 disks (10 data disks), stripe unit size should be chosen as 128KiB. This leads to a full stripe size of 1280KiB (1.25MiB). The thin pool should then be created with the chunk size of 1280KiB.
For RAID 10 storage, the preferred stripe unit size is 256KiB. This can also serve as the thin pool chunk size. Note that RAID 10 is recommended when the workload has a large proportion of small file writes or random writes. In this case, a small thin pool chunk size is more appropriate, as it reduces copy-on-write overhead with snapshots.
For JBOD, use a thin pool chunk size of 256 K.
The following example shows how to create the thin pool from the data LV and metadata LV, created earlier:
```
lvconvert --chunksize 1280K --thinpool VOLGROUP/thin_pool --poolmetadata VOLGROUP/metadata_device_name
```
```
lvconvert --chunksize 1280K --thinpool VOLGROUP/thin_pool --poolmetadata VOLGROUP/metadata_device_name
```
Copy to Clipboard Toggle word wrap
By default, the newly provisioned chunks in a thin pool are zeroed to prevent data leaking between different block devices. In the case of Red Hat Storage, where data is accessed via a file system, this option can be turned off for better performance.
```
lvchange --zero n VOLGROUP/thin_pool
```
```
lvchange --zero n VOLGROUP/thin_pool
```
Copy to Clipboard Toggle word wrap
Creating a Thin Logical Volume
After the thin pool has been created as mentioned above, a thinly provisioned logical volume can be created in the thin pool to serve as storage for a brick of a Red Hat Storage volume.
LVM allows multiple thinly-provisioned LVs to share a thin pool; this allows a common pool of physical storage to be used for multiple Red Hat Storage bricks and simplifies provisioning. However, such sharing of the thin pool metadata and data devices can impact performance in a number of ways.
Note
To avoid performance problems resulting from the sharing of the same thin pool, Red Hat Storage recommends that the LV for each Red Hat Storage brick have a dedicated thin pool of its own. As Red Hat Storage volume snapshots are created, snapshot LVs will get created and share the thin pool with the brick LV
```
lvcreate --thin --name LV_name --virtualsize LV_size VOLGROUP/thin_pool
```
```
lvcreate --thin --name LV_name --virtualsize LV_size VOLGROUP/thin_pool
```
Copy to Clipboard Toggle word wrap

XFS Inode Size

As Red Hat Storage makes extensive use of extended attributes, an XFS inode size of 512 bytes works better with Red Hat Storage than the default XFS inode size of 256 bytes. So, inode size for XFS must be set to 512 bytes while formatting the Red Hat Storage bricks. To set the inode size, you have to use -i size option with the mkfs.xfs command as shown in the following Logical Block Size for the Directory section.

XFS RAID Alignment

When creating an XFS file system, you can explicitly specify the striping parameters of the underlying storage in the following format:.

mkfs.xfs other_options -d su=stripe_unit_size,sw=stripe_width_in_number_of_disks device

mkfs.xfs other_options -d su=stripe_unit_size,sw=stripe_width_in_number_of_disks device

Copy to Clipboard

Toggle word wrap

For RAID 6, ensure that I/O is aligned at the file system layer by providing the striping parameters. For RAID 6 storage with 12 disks, if the recommendations above have been followed, the values must be as following:

mkfs.xfs other_options -d su=128K,sw=10 device

# mkfs.xfs other_options -d su=128K,sw=10 device

Copy to Clipboard

Toggle word wrap

For RAID 10 and JBOD, the -d su=<>,sw=<> option can be omitted. By default, XFS will use the thin-p chunk size and other parameters to make layout decisions.

Logical Block Size for the Directory

An XFS file system allows to select a logical block size for the file system directory that is greater than the logical block size of the file system. Increasing the logical block size for the directories from the default 4 K, decreases the directory I/O, which in turn improves the performance of directory operations. To set the block size, you need to use -n size option with the mkfs.xfs command as shown in the following example output.

Following is the example output of RAID 6 configuration along with inode and block size options:

mkfs.xfs -f -i size=512 -n size=8192 -d su=128K,sw=10 logical volume
meta-data=/dev/mapper/gluster-brick1	isize=512    agcount=32, agsize=37748736 blks
         =				sectsz=512   attr=2, projid32bit=0
data     = 				bsize=4096   blocks=1207959552, imaxpct=5
         =				sunit=32     swidth=320 blks
naming   = version 2			bsize=8192   ascii-ci=0
log      =internal log			bsize=4096   blocks=521728, version=2
         =				sectsz=512   sunit=32 blks, lazy-count=1
realtime =none				extsz=4096   blocks=0, rtextents=0

# mkfs.xfs -f -i size=512 -n size=8192 -d su=128K,sw=10 logical volume
meta-data=/dev/mapper/gluster-brick1	isize=512    agcount=32, agsize=37748736 blks
         =				sectsz=512   attr=2, projid32bit=0
data     = 				bsize=4096   blocks=1207959552, imaxpct=5
         =				sunit=32     swidth=320 blks
naming   = version 2			bsize=8192   ascii-ci=0
log      =internal log			bsize=4096   blocks=521728, version=2
         =				sectsz=512   sunit=32 blks, lazy-count=1
realtime =none				extsz=4096   blocks=0, rtextents=0

Copy to Clipboard

Toggle word wrap

Allocation Strategy

inode32 and inode64 are two most common allocation strategies for XFS. With inode32 allocation strategy, XFS places all the inodes in the first 1 TiB of disk. With larger disk, all the inodes would be stuck in first 1 TiB. inode32 allocation strategy is used by default.

With inode64 mount option inodes would be replaced near to the data which would be minimize the disk seeks.

To set the allocation strategy to inode64 when file system is being mounted, you need to use -o inode64 option with the mkfs.xfs command as shown in the following Access Time section.

Access Time

If the application does not require to update the access time on files, than file system must always be mounted with noatime mount option. For example:

mount -t xfs -o inode64,noatime <logical volume> <mount point>

# mount -t xfs -o inode64,noatime <logical volume> <mount point>

Copy to Clipboard

Toggle word wrap

This optimization improves performance of small-file reads by avoiding updates to the XFS inodes when files are read.

/etc/fstab entry for option E + F
 <logical volume> <mount point>xfs     inode64,noatime   0 0

/etc/fstab entry for option E + F
 <logical volume> <mount point>xfs     inode64,noatime   0 0

Copy to Clipboard

Toggle word wrap

Performance tuning option in Red Hat Storage

Run the following command after creating the volume:

tuned-adm profile default ; tuned-adm profile rhs-high-throughput

Switching to profile 'default'
Applying ktune sysctl settings:
/etc/ktune.d/tunedadm.conf:                           [  OK  ]
Applying sysctl settings from /etc/sysctl.conf
Starting tuned:                                       [  OK  ]
Stopping tuned:                                       [  OK  ]
Switching to profile 'rhs-high-throughput'

# tuned-adm profile default ; tuned-adm profile rhs-high-throughput

Switching to profile 'default' 
Applying ktune sysctl settings: 
/etc/ktune.d/tunedadm.conf:                           [  OK  ] 
Applying sysctl settings from /etc/sysctl.conf  
Starting tuned:                                       [  OK  ] 
Stopping tuned:                                       [  OK  ] 
Switching to profile 'rhs-high-throughput'

Copy to Clipboard

Toggle word wrap

This profile performs the following:

Increases read ahead to 64 MB
Changes I/O scheduler to deadline
Disables power-saving mode

Writeback caching

For small-file and random write performance, we strongly recommend writeback cache, that is, non-volatile random-access memory (NVRAM) in your storage controller. For example, normal Dell and HP storage controllers have it. Ensure that NVRAM is enabled, that is, the battery is working. Refer your hardware documentation for details on enabling NVRAM.

Do not enable writeback caching in the disk drives, this is a policy where the disk drive considers the write is complete before the write actually made it to the magnetic media (platter). As a result, the disk write cache might lose its data during a power failure or even loss of metadata leading to file system corruption.

Allocation groups

Each XFS file system is partitioned into regions called allocation groups. Allocation groups are similar to the block groups in ext3, but allocation groups are much larger than block groups and are used for scalability and parallelism rather than disk locality. The default allocation for an allocation group is 1 TiB.

Allocation group count must be large enough to sustain the concurrent allocation workload. In most of the cases allocation group count chosen by mkfs.xfs command would give the optimal performance. Do not change the allocation group count chosen by mkfs.xfs, while formatting the file system.

Percentage of space allocation to inodes

If the workload is very small files (average file size is less than 10 KB ), then it is recommended to set maxpct value to 10, while formatting the file system.

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links