Chapter 11. Troubleshooting
This chapter describes the most common troubleshooting scenarios related to Red Hat Openshift Container Storage.
- What to do if a Red Hat Openshift Container Storage node Fails
If a Red Hat Openshift Container Storage node fails, and you want to delete it, then, disable the node before deleting it. For more information, see Section 1.2.4, “Deleting Node”.
If a Red Hat Openshift Container Storage node fails and you want to replace it, see Section 1.3.2, “Replacing Nodes”.
- What to do if a Red Hat Openshift Container Storage device fails
If a Red Hat Openshift Container Storage device fails, and you want to delete it, then, disable the device before deleting it. For more information, see Section 1.2.3, “Deleting Device”.
If a Red Hat Openshift Container Storage device fails, and you want to replace it, see Section 1.3.1, “Replacing Devices”.
- What to do if Red Hat Openshift Container Storage volumes require more capacity
- You can increase the storage capacity by either adding devices, increasing the cluster size, or adding an entirely new cluster. For more information, see Section 1.1, “Increasing Storage Capacity”.
- How to upgrade Openshift when Red Hat Openshift Container Storage is installed
- To upgrade Openshift Container Platform, see https://access.redhat.com/documentation/en-us/openshift_container_platform/3.11/html/upgrading_clusters/install-config-upgrading-automated-upgrades#upgrading-to-ocp-3-10.
- Viewing Log Files
Viewing Red Hat Gluster Storage Container Logs
Debugging information related to Red Hat Gluster Storage containers is stored on the host where the containers are started. Specifically, the logs and configuration files can be found at the following locations on the openshift nodes where the Red Hat Gluster Storage server containers run:
- /etc/glusterfs
- /var/lib/glusterd
- /var/log/glusterfs
Viewing Heketi Logs
Debugging information related to Heketi is stored locally in the container or in the persisted volume that is provided to Heketi container.
You can obtain logs for Heketi by running the
docker logs <container-id>
command on the openshift node where the container is being run.
- Heketi command returns with no error or empty error
Sometimes, running heketi-cli command returns with no error or empty error like _ Error_.It is mostly due to heketi server not properly configured. You must first ping to validate that the Heketi server is available and later verify with a _ curl_ command and _ /hello endpoint_.
# curl http://deploy-heketi-storage-project.cloudapps.mystorage.com/hello
- Heketi reports an error while loading the topology file
- Running heketi-cli reports : Error "Unable to open topology file" error while loading the topology file. This could be due to the use of old syntax of single hyphen (-) as a prefix for JSON option. You must use the new syntax of double hyphens and reload the topology file.
- cURL command to heketi server fails or does not respond
If the router or heketi is not configured properly, error messages from the heketi may not be clear. To troubleshoot, ping the heketi service using the endpoint and also using the IP address. If ping by the IP address succeeds and ping by the endpoint fails, it indicates a router configuration error.
After the router is setup properly, run a simple curl command like the following:
# curl http://deploy-heketi-storage-project.cloudapps.mystorage.com/hello
If heketi is configured correctly, a welcome message from heketi is displayed. If not, check the heketi configuration.
- Heketi fails to start when Red Hat Gluster Storage volume is used to store heketi.db file
Sometimes Heketi fails to start when Red Hat Gluster Storage volume is used to store heketi.db and reports the following error:
[heketi] INFO 2016/06/23 08:33:47 Loaded kubernetes executor [heketi] ERROR 2016/06/23 08:33:47 /src/github.com/heketi/heketi/apps/glusterfs/app.go:149: write /var/lib/heketi/heketi.db: read-only file system ERROR: Unable to start application
The read-only file system error as shown above could be seen while using a Red Hat Gluster Storage volume as backend. This could be when the quorum is lost for the Red Hat Gluster Storage volume. In a replica-3 volume, this would be seen if 2 of the 3 bricks are down. You must ensure the quorum is met for heketi gluster volume and it is able to write to heketi.db file again.
Even if you see a different error, it is a recommended practice to check if the Red Hat Gluster Storage volume serving heketi.db file is available or not. Access deny to heketi.db file is the most common reason for it to not start.