このコンテンツは選択した言語では利用できません。
16.14. Troubleshooting
- Situation
Snapshot creation fails.
Step 1Check if the bricks are thinly provisioned by following these steps:
- Execute the
mount
command and check the device name mounted on the brick path. For example:mount
# mount /dev/mapper/snap_lvgrp-snap_lgvol on /brick/brick-dirs type xfs (rw) /dev/mapper/snap_lvgrp1-snap_lgvol1 on /brick/brick-dirs1 type xfs (rw)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Run the following command to check if the device has a LV pool name.
lvs device-name
lvs device-name
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow If thePool
field is empty, then the brick is not thinly provisioned. - Ensure that the brick is thinly provisioned, and retry the snapshot create command.
Step 2Check if the bricks are down by following these steps:
- Execute the following command to check the status of the volume:
gluster volume status VOLNAME
# gluster volume status VOLNAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - If any bricks are down, then start the bricks by executing the following command:
gluster volume start VOLNAME force
# gluster volume start VOLNAME force
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - To verify if the bricks are up, execute the following command:
gluster volume status VOLNAME
# gluster volume status VOLNAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Retry the snapshot create command.
Step 3Check if the node is down by following these steps:
- Execute the following command to check the status of the nodes:
gluster volume status VOLNAME
# gluster volume status VOLNAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - If a brick is not listed in the status, then execute the following command:
gluster pool list
# gluster pool list
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - If the status of the node hosting the missing brick is
Disconnected
, then power-up the node. - Retry the snapshot create command.
Step 4Check if rebalance is in progress by following these steps:
- Execute the following command to check the rebalance status:
gluster volume rebalance VOLNAME status
gluster volume rebalance VOLNAME status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - If rebalance is in progress, wait for it to finish.
- Retry the snapshot create command.
- Situation
Snapshot delete fails.
Step 1Check if the server quorum is met by following these steps:
- Execute the following command to check the peer status:
gluster pool list
# gluster pool list
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - If nodes are down, and the cluster is not in quorum, then power up the nodes.
- To verify if the cluster is in quorum, execute the following command:
gluster pool list
# gluster pool list
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Retry the snapshot delete command.
- Situation
Snapshot delete command fails on some node(s) during commit phase, leaving the system inconsistent.
Solution- Identify the node(s) where the delete command failed. This information is available in the delete command's error output. For example:
gluster snapshot delete snapshot1
# gluster snapshot delete snapshot1 Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y snapshot delete: failed: Commit failed on 10.00.00.02. Please check log file for details. Snapshot command failed
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - On the node where the delete command failed, bring down glusterd using the following command:
service glusterd stop
# service glusterd stop
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Delete that particular snaps repository in
/var/lib/glusterd/snaps/
from that node. For example:rm -rf /var/lib/glusterd/snaps/snapshot1
# rm -rf /var/lib/glusterd/snaps/snapshot1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Start glusterd on that node using the following command:
service glusterd start.
# service glusterd start.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Repeat the 2nd, 3rd, and 4th steps on all the nodes where the commit failed as identified in the 1st step.
- Retry deleting the snapshot. For example:
gluster snapshot delete snapshot1
# gluster snapshot delete snapshot1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Situation
Snapshot restore fails.
Step 1Check if the server quorum is met by following these steps:
- Execute the following command to check the peer status:
gluster pool list
# gluster pool list
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - If nodes are down, and the cluster is not in quorum, then power up the nodes.
- To verify if the cluster is in quorum, execute the following command:
gluster pool list
# gluster pool list
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Retry the snapshot restore command.
Step 2Check if the volume is in
Stop
state by following these steps:- Execute the following command to check the volume info:
gluster volume info VOLNAME
# gluster volume info VOLNAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - If the volume is in
Started
state, then stop the volume using the following command:gluster volume stop VOLNAME
gluster volume stop VOLNAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Retry the snapshot restore command.
- Situation
The brick process is hung.
SolutionCheck if the LVM data / metadata utilization had reached 100% by following these steps:
- Execute the mount command and check the device name mounted on the brick path. For example:
mount
# mount /dev/mapper/snap_lvgrp-snap_lgvol on /brick/brick-dirs type xfs (rw) /dev/mapper/snap_lvgrp1-snap_lgvol1 on /brick/brick-dirs1 type xfs (rw)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Execute the following command to check if the data/metadatautilization has reached 100%:
lvs -v device-name
lvs -v device-name
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:lvs -o data_percent,metadata_percent -v /dev/mapper/snap_lvgrp-snap_lgvol
# lvs -o data_percent,metadata_percent -v /dev/mapper/snap_lvgrp-snap_lgvol Using logical volume(s) on command line Data% Meta% 0.40
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Note
Ensure that the data and metadata does not reach the maximum limit. Usage of monitoring tools like Nagios, will ensure you do not come across such situations. For more information about Nagios, see Chapter 17, Monitoring Red Hat Gluster Storage - Situation
Snapshot commands fail.
Step 1Check if there is a mismatch in the operating versions by following these steps:
- Open the following file and check for the operating version:
/var/lib/glusterd/glusterd.info
/var/lib/glusterd/glusterd.info
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If theoperating-version
is lesser than 30000, then the snapshot commands are not supported in the version the cluster is operating on. - Upgrade all nodes in the cluster to Red Hat Gluster Storage 3.1.
- Retry the snapshot command.
- Situation
After rolling upgrade, snapshot feature does not work.
SolutionYou must ensure to make the following changes on the cluster to enable snapshot:
- Restart the volume using the following commands.
gluster volume stop VOLNAME gluster volume start VOLNAME
# gluster volume stop VOLNAME # gluster volume start VOLNAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Restart glusterd services on all nodes.
service glusterd restart
# service glusterd restart
Copy to Clipboard Copied! Toggle word wrap Toggle overflow