11.11. Rebalancing Volumes
If a volume has been expanded or shrunk using the
add-brick
or remove-brick
commands, the data on the volume needs to be rebalanced among the servers.
Note
In a non-replicated volume, all bricks should be online to perform the
rebalance
operation using the start option. In a replicated volume, at least one of the bricks in the replica should be online.
To rebalance a volume, use the following command on any of the servers:
# gluster volume rebalance VOLNAME start
For example:
# gluster volume rebalance test-volume start Starting rebalancing on volume test-volume has been successful
When run without the
force
option, the rebalance command attempts to balance the space utilized across nodes. Files whose migration would cause the target node to have less available space than the source node are skipped. This results in linkto files being retained, which may cause slower access when a large number of linkto files are present.
Red Hat strongly recommends you to disconnect all the older clients before executing the rebalance command to avoid a potential data loss scenario.
Warning
The
Rebalance
command can be executed with the force option even when the older clients are connected to the cluster. However, this could lead to a data loss situation.
A
rebalance
operation with force
, balances the data based on the layout, and hence optimizes or does away with the link files, but may lead to an imbalanced storage space used across bricks. This option is to be used only when there are a large number of link files in the system.
To rebalance a volume forcefully, use the following command on any of the servers:
# gluster volume rebalance VOLNAME start force
For example:
# gluster volume rebalance test-volume start force Starting rebalancing on volume test-volume has been successful
11.11.1. Rebalance Throttling
The rebalance process uses multiple threads to ensure good performance during migration of multiple files. During multiple file migration, there can be a severe impact on storage system performance and a throttling mechanism is provided to manage it.
By default, the rebalance throttling is started in the
normal
mode. Configure the throttling modes to adjust the rate at which the files must be migrated
# gluster volume set VOLNAME rebal-throttle lazy|normal|aggressive
For example:
# gluster volume set test-volume rebal-throttle lazy
11.11.2. Displaying Rebalance Progress
To display the status of a volume rebalance operation, use the following command:
# gluster volume rebalance VOLNAME status
For example:
# gluster volume rebalance test-volume status Node Rebalanced size scanned failures skipped status run time -files in h:m:s ------------- ---------- ------ ------- -------- ------- ----------- -------- localhost 71962 70.3GB 380852 0 0 in progress 2:02:20 server1 70489 68.8GB 502185 0 0 in progress 2:02:20 server2 70704 69.0GB 507728 0 0 in progress 2:02:20 server3 71819 70.1GB 435611 0 0 in progress 2:02:20 Estimated time left for rebalance to complete : 2:50:24
A rebalance operation starts a rebalance process on each node of the volume. Each process is responsible for rebalancing the files on its own individual node. Each row of the rebalance status output describes the progress of the operation on a single node.
Important
If there is a reboot while rebalancing the rebalance output might display an incorrect status and some files might not get rebalanced.
Workaround: After the reboot, once the previous rebalance is completed, trigger another rebalance so that the files that were not balanced during the reboot are now rebalanced correctly, and the rebalance output gives the correct status.
The following table describes the output of the rebalance status command:
Property Name | Description |
---|---|
Node | The name of the node. |
Rebalanced-files | The number of files that were successfully migrated. |
size | The total size of the files that were migrated. |
scanned | The number of files scanned on the node. This includes the files that were migrated. |
failures | The number of files that could not be migrated because of errors. |
skipped | The number of files which were skipped because of various errors or reasons. |
status | The status of the rebalance operation on the node is in progress , completed , or failed . |
run time in h:m:s | The amount of time for which the process has been running on the node. |
The estimated time left for the rebalance to complete on all nodes is also displayed. The estimated time to complete is displayed only after the rebalance operation has been running for 10 minutes. In cases where the remaining time is extremely large, the estimated time to completion is displayed as
>2 months
and the user is advised to check again later.
The time taken to complete a rebalance operation depends on the number of files estimated to be on the bricks and the rate at which files are being processed by the rebalance process. This value is recalculated every time the rebalance status command is executed and becomes more accurate the longer rebalance has been running, and for large data sets. The calculation assumes that a file system partition contains a single brick.
A rebalance balance operation is considered complete when the status of every node is
completed
. For example:
# gluster volume rebalance test-volume status Node Rebalanced size scanned failures skipped status run time -files in h:m:s ---------- ---------- ----- ------- -------- ------- ----------- -------- node2 0 0Bytes 0 0 0 completed 0:02:23 node3 234 737.8KB 3350 0 257 completed 0:02:25 node4 3 14.6K 71 0 6 completed 0:00:02 localhost 317 1.1MB 3484 0 155 completed 0:02:38 volume rebalance: test-volume: success
With this release, details about the files that are skipped during rebalance operation can be obtained. Entries of all such files are available in the
rebalance log
with the message ID 109126. You can search for the message ID from the log file and get the list of all the skipped files:
For example:
# grep -i 109126 /var/log/glusterfs/test-volume-rebalance.log [2018-03-15 09:14:30.203393] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/ABI/stable/sysfs-fs-orangefs. [2018-03-15 09:14:31.262969] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/ABI/stable/sysfs-devices. [2018-03-15 09:14:31.842631] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/ABI/stable/sysfs-devices-system-cpu. [2018-03-15 09:14:33.733728] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/ABI/testing/sysfs-bus-fcoe. [2018-03-15 09:14:35.576404] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/ABI/testing/sysfs-bus-iio-frequency-ad9523. [2018-03-15 09:14:43.378480] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/DocBook/kgdb.tmpl.
To know more about the failed files, search for 'migrate-data failed' in the rebalance.log file. However, the count for rebalance failed files will not match with "migrate-data failed" in the rebalance.log because the failed count includes all possible failures and just not file migration.
11.11.3. Stopping a Rebalance Operation
To stop a rebalance operation, use the following command:
# gluster volume rebalance VOLNAME stop
For example:
# gluster volume rebalance test-volume stop Node Rebalanced size scanned failures skipped status run time -files in h:m:s ------------- ---------- ------- ------- -------- ------- ----------- -------- localhost 106504 104.0GB 558111 0 0 stopped 3:02:24 server1 102299 99.9GB 725239 0 0 stopped 3:02:24 server2 102264 99.9GB 737364 0 0 stopped 3:02:24 server3 106813 104.3GB 646581 0 0 stopped 3:02:24 Estimated time left for rebalance to complete : 2:06:38