OpenShift Container Storage is now OpenShift Data Foundation starting with version 4.9.
Troubleshooting OpenShift Data Foundation
Instructions on troubleshooting OpenShift Data Foundation
Abstract
Making open source more inclusive Copy linkLink copied to clipboard!
Red Hat is committed to replacing problematic language in our code, documentation, and web properties. We are beginning with these four terms: master, slave, blacklist, and whitelist. Because of the enormity of this endeavor, these changes will be implemented gradually over several upcoming releases. For more details, see our CTO Chris Wright’s message.
Providing feedback on Red Hat documentation Copy linkLink copied to clipboard!
We appreciate your input on our documentation. Do let us know how we can make it better.
To give feedback, create a Bugzilla ticket:
- Go to the Bugzilla website.
- In the Component section, choose documentation.
- Fill in the Description field with your suggestion for improvement. Include a link to the relevant part(s) of documentation.
- Click Submit Bug.
Chapter 1. Overview Copy linkLink copied to clipboard!
Troubleshooting OpenShift Data Foundation is written to help administrators understand how to troubleshoot and fix their Red Hat OpenShift Data Foundation cluster.
Most troubleshooting tasks focus on either a fix or a workaround. This document is divided into chapters based on the errors that an administrator may encounter:
- Chapter 2, Downloading log files and diagnostic information using must-gather shows you how to use the must-gather utility in OpenShift Data Foundation.
- Chapter 3, Commonly required logs for troubleshooting shows you how to obtain commonly required log files for OpenShift Data Foundation.
- Chapter 6, Troubleshooting alerts and errors in OpenShift Data Foundation shows you how to identify the encountered error and perform required actions.
Red Hat does not support running Ceph commands in OpenShift Data Foundation clusters (unless indicated by Red Hat support or Red Hat documentation) as it can cause data loss if you run the wrong commands. In that case, the Red Hat support team is only able to provide commercially reasonable effort and may not be able to restore all the data in case of any data loss.
Chapter 2. Downloading log files and diagnostic information using must-gather Copy linkLink copied to clipboard!
If Red Hat OpenShift Data Foundation is unable to automatically resolve a problem, use the must-gather tool to collect log files and diagnostic information so that you or Red Hat support can review the problem and determine a solution.
When Red Hat OpenShift Data Foundation is deployed in external mode, must-gather only collects logs from the OpenShift Data Foundation cluster and does not collect debug data and logs from the external Red Hat Ceph Storage cluster. To collect debug logs from the external Red Hat Ceph Storage cluster, see Red Hat Ceph Storage Troubleshooting guide and contact your Red Hat Ceph Storage Administrator.
Prerequisites
Optional: If OpenShift Data Foundation is deployed in a disconnected environment, ensure that you mirror the individual
must-gatherimage to the mirror registry available from the disconnected environment.oc image mirror registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.10 <local-registry>/odf4/ocs-must-gather-rhel8:v4.10 [--registry-config=<path-to-the-registry-config>] [--insecure=true]
$ oc image mirror registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.10 <local-registry>/odf4/ocs-must-gather-rhel8:v4.10 [--registry-config=<path-to-the-registry-config>] [--insecure=true]Copy to Clipboard Copied! Toggle word wrap Toggle overflow <local-registry>- Is the local image mirror registry available for a disconnected OpenShift Container Platform cluster.
<path-to-the-registry-config>-
Is the path to your registry credentials, by default it is
~/.docker/config.json. --insecure- Add this flag only if the mirror registry is insecure.
For more information, see the Red Hat Knowledgebase solutions:
Procedure
Run the
must-gathercommand from the client connected to the OpenShift Data Foundation cluster:oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.10 --dest-dir=<directory-name>
$ oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.10 --dest-dir=<directory-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <directory-name>Is the name of the directory where you want to write the data to.
ImportantFor a disconnected environment deployment, replace the image in
--imageparameter with the mirroredmust-gatherimage.oc adm must-gather --image=<local-registry>/odf4/ocs-must-gather-rhel8:v4.10 --dest-dir=<directory-name>
$ oc adm must-gather --image=<local-registry>/odf4/ocs-must-gather-rhel8:v4.10 --dest-dir=<directory-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <local-registry>- Is the local image mirror registry available for a disconnected OpenShift Container Platform cluster.
This collects the following information in the specified directory:
- All Red Hat OpenShift Data Foundation cluster related Custom Resources (CRs) with their namespaces.
- Pod logs of all the Red Hat OpenShift Data Foundation related pods.
- Output of some standard Ceph commands like Status, Cluster health, and others.
Command variations
If one or more master nodes are not in the Ready state, use
--node-nameto provide a master node that is Ready so that themust-gatherpod can be safely scheduled.oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.10 --dest-dir=_<directory-name>_ --node-name=_<node-name>_
$ oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.10 --dest-dir=_<directory-name>_ --node-name=_<node-name>_Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you want to gather information from a specific time:
To specify a relative time period for logs gathered, such as within 5 seconds or 2 days, add
/usr/bin/gather since=<duration>:oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.10 --dest-dir=_<directory-name>_ /usr/bin/gather since=<duration>
$ oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.10 --dest-dir=_<directory-name>_ /usr/bin/gather since=<duration>Copy to Clipboard Copied! Toggle word wrap Toggle overflow To specify a specific time to gather logs after, add
/usr/bin/gather since-time=<rfc3339-timestamp>:oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.10 --dest-dir=_<directory-name>_ /usr/bin/gather since-time=<rfc3339-timestamp>
$ oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.10 --dest-dir=_<directory-name>_ /usr/bin/gather since-time=<rfc3339-timestamp>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Replace the example values in these commands as follows:
- <node-name>
-
If one or more master nodes are not in the Ready state, use this parameter to provide the name of a master node that is still in the Ready state. This avoids scheduling errors by ensuring that the
must-gatherpod is not scheduled on a master node that is not ready. - <directory-name>
-
The directory to store information collected by
must-gather. - <duration>
-
Specify the period of time to collect information from as a relative duration, for example,
5h(starting from 5 hours ago). - <rfc3339-timestamp>
-
Specify the period of time to collect information from as an RFC 3339 timestamp, for example,
2020-11-10T04:00:00+00:00(starting from 4am UTC on 11 Nov 2020).
Chapter 3. Commonly required logs for troubleshooting Copy linkLink copied to clipboard!
Some of the commonly used logs for troubleshooting OpenShift Data Foundation are listed, along with the commands to generate them.
Generating logs for a specific pod:
oc logs <pod-name> -n <namespace>
$ oc logs <pod-name> -n <namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Generating logs for Ceph or OpenShift Data Foundation cluster:
oc logs rook-ceph-operator-<ID> -n openshift-storage
$ oc logs rook-ceph-operator-<ID> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantCurrently, the rook-ceph-operator logs do not provide any information about the failure and this acts as a limitation in troubleshooting issues, see Enabling and disabling debug logs for rook-ceph-operator.
Generating logs for plugin pods like cephfs or rbd to detect any problem in the PVC mount of the app-pod:
oc logs csi-cephfsplugin-<ID> -n openshift-storage -c csi-cephfsplugin
$ oc logs csi-cephfsplugin-<ID> -n openshift-storage -c csi-cephfspluginCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc logs csi-rbdplugin-<ID> -n openshift-storage -c csi-rbdplugin
$ oc logs csi-rbdplugin-<ID> -n openshift-storage -c csi-rbdpluginCopy to Clipboard Copied! Toggle word wrap Toggle overflow To generate logs for all the containers in the CSI pod:
oc logs csi-cephfsplugin-<ID> -n openshift-storage --all-containers
$ oc logs csi-cephfsplugin-<ID> -n openshift-storage --all-containersCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc logs csi-rbdplugin-<ID> -n openshift-storage --all-containers
$ oc logs csi-rbdplugin-<ID> -n openshift-storage --all-containersCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Generating logs for cephfs or rbd provisioner pods to detect problems if PVC is not in BOUND state:
oc logs csi-cephfsplugin-provisioner-<ID> -n openshift-storage -c csi-cephfsplugin
$ oc logs csi-cephfsplugin-provisioner-<ID> -n openshift-storage -c csi-cephfspluginCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc logs csi-rbdplugin-provisioner-<ID> -n openshift-storage -c csi-rbdplugin
$ oc logs csi-rbdplugin-provisioner-<ID> -n openshift-storage -c csi-rbdpluginCopy to Clipboard Copied! Toggle word wrap Toggle overflow To generate logs for all the containers in the CSI pod:
oc logs csi-cephfsplugin-provisioner-<ID> -n openshift-storage --all-containers
$ oc logs csi-cephfsplugin-provisioner-<ID> -n openshift-storage --all-containersCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc logs csi-rbdplugin-provisioner-<ID> -n openshift-storage --all-containers
$ oc logs csi-rbdplugin-provisioner-<ID> -n openshift-storage --all-containersCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Generating OpenShift Data Foundation logs using cluster-info command:
oc cluster-info dump -n openshift-storage --output-directory=<directory-name>
$ oc cluster-info dump -n openshift-storage --output-directory=<directory-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow When using Local Storage Operator, generating logs can be done using cluster-info command:
oc cluster-info dump -n openshift-local-storage --output-directory=<directory-name>
$ oc cluster-info dump -n openshift-local-storage --output-directory=<directory-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the OpenShift Data Foundation operator logs and events.
To check the operator logs :
oc logs <ocs-operator> -n openshift-storage
# oc logs <ocs-operator> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow - <ocs-operator>
oc get pods -n openshift-storage | grep -i "ocs-operator" | awk '{print $1}'# oc get pods -n openshift-storage | grep -i "ocs-operator" | awk '{print $1}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
To check the operator events :
oc get events --sort-by=metadata.creationTimestamp -n openshift-storage
# oc get events --sort-by=metadata.creationTimestamp -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Get the OpenShift Data Foundation operator version and channel.
oc get csv -n openshift-storage
# oc get csv -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output :
NNAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.0 NooBaa Operator 4.10.0 Succeeded ocs-operator.v4.10.0 OpenShift Container Storage 4.10.0 Succeeded odf-csi-addons-operator.v4.10.0 CSI Addons 4.10.0 Succeeded odf-operator.v4.10.0 OpenShift Data Foundation 4.10.0 Succeeded
NNAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.0 NooBaa Operator 4.10.0 Succeeded ocs-operator.v4.10.0 OpenShift Container Storage 4.10.0 Succeeded odf-csi-addons-operator.v4.10.0 CSI Addons 4.10.0 Succeeded odf-operator.v4.10.0 OpenShift Data Foundation 4.10.0 SucceededCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc get subs -n openshift-storage
# oc get subs -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output :
NAME PACKAGE SOURCE CHANNEL mcg-operator-stable-4.10-redhat-operators-openshift-marketplace mcg-operator redhat-operators stable-4.10 ocs-operator-stable-4.10-redhat-operators-openshift-marketplace ocs-operator redhat-operators stable-4.10 odf-csi-addons-operator odf-csi-addons-operator redhat-operators stable-4.10 odf-operator odf-operator redhat-operators stable-4.10
NAME PACKAGE SOURCE CHANNEL mcg-operator-stable-4.10-redhat-operators-openshift-marketplace mcg-operator redhat-operators stable-4.10 ocs-operator-stable-4.10-redhat-operators-openshift-marketplace ocs-operator redhat-operators stable-4.10 odf-csi-addons-operator odf-csi-addons-operator redhat-operators stable-4.10 odf-operator odf-operator redhat-operators stable-4.10Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm that the installplan is created.
oc get installplan -n openshift-storage
# oc get installplan -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the image of the components post updating OpenShift Data Foundation.
Check the node on which the pod of the component you want to verify the image is running.
oc get pods -o wide | grep <component-name>
# oc get pods -o wide | grep <component-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For Example :
oc get pods -o wide | grep rook-ceph-operator
# oc get pods -o wide | grep rook-ceph-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
rook-ceph-operator-566cc677fd-bjqnb 1/1 Running 20 4h6m 10.128.2.5 rook-ceph-operator-566cc677fd-bjqnb 1/1 Running 20 4h6m 10.128.2.5 dell-r440-12.gsslab.pnq2.redhat.com <none> <none> <none> <none>
rook-ceph-operator-566cc677fd-bjqnb 1/1 Running 20 4h6m 10.128.2.5 rook-ceph-operator-566cc677fd-bjqnb 1/1 Running 20 4h6m 10.128.2.5 dell-r440-12.gsslab.pnq2.redhat.com <none> <none> <none> <none>Copy to Clipboard Copied! Toggle word wrap Toggle overflow dell-r440-12.gsslab.pnq2.redhat.comis the node-name.Check the image ID.
oc debug node/<node name>
# oc debug node/<node name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <node-name>
Is the name of the node on which the pod of the component you want to verify the image is running.
chroot /host
# chroot /hostCopy to Clipboard Copied! Toggle word wrap Toggle overflow crictl images | grep <component>
# crictl images | grep <component>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For Example :
crictl images | grep rook-ceph
# crictl images | grep rook-cephCopy to Clipboard Copied! Toggle word wrap Toggle overflow Take a note of the
IMAGEIDand map it to the Digest ID on the Rook Ceph Operator page.
Additional resources
Chapter 4. Overriding the cluster-wide default node selector for OpenShift Data Foundation post deployment Copy linkLink copied to clipboard!
When a cluster-wide default node selector is used for OpenShift Data Foundation, the pods generated by CSI daemonsets are able to start only on the nodes that match the selector. To be able to use OpenShift Data Foundation from nodes which do not match the selector, override the cluster-wide default node selector by performing the following steps in the command line interface :
Procedure
Specify a blank node selector for the openshift-storage namespace.
oc annotate namespace openshift-storage openshift.io/node-selector=
$ oc annotate namespace openshift-storage openshift.io/node-selector=Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the original pods generated by the DaemonSets.
oc delete pod -l app=csi-cephfsplugin -n openshift-storage oc delete pod -l app=csi-rbdplugin -n openshift-storage
oc delete pod -l app=csi-cephfsplugin -n openshift-storage oc delete pod -l app=csi-rbdplugin -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 5. Encryption token is deleted or expired Copy linkLink copied to clipboard!
Use this procedure to update the token if the encryption token for your key management system gets deleted or expires.
Prerequisites
- Ensure that you have a new token with the same policy as the deleted or expired token
Procedure
- Log in to OpenShift Container Platform Web Console.
- Click Workloads → Secrets
To update the ocs-kms-token used for cluster wide encryption:
-
Set the Project to
openshift-storage. - Click ocs-kms-token → Actions → Edit Secret.
- Drag and drop or upload your encryption token file in the Value field. The token can either be a file or text that can be copied and pasted.
- Click Save.
-
Set the Project to
To update the ceph-csi-kms-token for a given project or namespace with encrypted persistent volumes:
- Select the required Project.
- Click ceph-csi-kms-token → Actions → Edit Secret.
- Drag and drop or upload your encryption token file in the Value field. The token can either be a file or text that can be copied and pasted.
Click Save.
NoteThe token can be deleted only after all the encrypted PVCs using the
ceph-csi-kms-tokenhave been deleted.
Chapter 6. Troubleshooting alerts and errors in OpenShift Data Foundation Copy linkLink copied to clipboard!
6.1. Resolving alerts and errors Copy linkLink copied to clipboard!
Red Hat OpenShift Data Foundation can detect and automatically resolve a number of common failure scenarios. However, some problems require administrator intervention.
To know the errors currently firing, check one of the following locations:
- Observe → Alerting → Firing option
- Home → Overview → Cluster tab
- Storage → Data Foundation → Storage System → storage system link in the pop up → Overview → Block and File tab
- Storage → Data Foundation → Storage System → storage system link in the pop up → Overview → Object tab
Copy the error displayed and search it in the following section to know its severity and resolution:
|
Name:
Message:
Description: Severity: Warning Resolution: Fix Procedure: Inspect the user interface and log, and verify if an update is in progress.
|
|
Name:
Message:
Description: Severity: Warning Resolution: Fix Procedure: Inspect the user interface and log, and verify if an update is in progress.
|
|
Name:
Message:
Description: Severity: Crtical Resolution: Fix Procedure: Remove unnecessary data or expand the cluster. |
|
Name:
Fixed:
Description: Severity: Warning Resolution: Fix Procedure: Remove unnecessary data or expand the cluster. |
|
Name:
Message:
Description: Severity: Warning Resolution: Workaround Procedure: Resolving NooBaa Bucket Error State |
|
Name:
Message:
Description: Severity: Warning Resolution: Fix Procedure: Resolving NooBaa Bucket Error State |
|
Name:
Message:
Description: Severity: Warning Resolution: Fix Procedure: Resolving NooBaa Bucket Error State |
|
Name:
Message:
Description: Severity: Warning Resolution: Fix |
|
Name:
Message:
Description: Severity: Warning Resolution: Fix |
|
Name:
Message:
Description: Severity: Warning Resolution: Fix |
|
Name:
Message:
Description: Severity: Warning Resolution: Fix |
|
Name:
Message:
Description: Severity: Warning Resolution: Workaround Procedure: Resolving NooBaa Bucket Error State |
|
Name:
Message:
Description: Severity: Warning Resolution: Fix |
|
Name:
Message:
Description: Severity: Warning Resolution: Fix |
|
Name:
Message:
Description: Severity: Warning Resolution: Fix |
|
Name:
Message: Description: `Minimum required replicas for storage metadata service not available. Might affect the working of storage cluster.` Severity: Warning Resolution: Contact Red Hat support Procedure:
|
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support Procedure:
|
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support Procedure:
|
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support Procedure:
|
|
Name:
Message:
Description: Severity: Warning Resolution: Contact Red Hat support Procedure:
|
|
Name:
Message:
Description: Severity: Warning Resolution: Contact Red Hat support |
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support |
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support |
|
Name:
Message:
Description: Severity: Warning Resolution: Contact Red Hat support |
|
Name:
Message:
Description: Severity: Warning Resolution: Contact Red Hat support |
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support |
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support Procedure:
|
|
Name:
Message:
Description: Severity: Critical Resolution: Contact Red Hat support |
|
Name:
Message: Description: Disaster recovery is failing for one or a few applications. Severity: Warning Resolution: Contact Red Hat support |
|
Name:
Message: Description: Disaster recovery is failing for the entire cluster. Mirror daemon is in unhealthy status for more than 1m. Mirroring on this cluster is not working as expected. Severity: Critical Resolution: Contact Red Hat support |
6.2. Resolving cluster health issues Copy linkLink copied to clipboard!
There is a finite set of possible health messages that a Red Hat Ceph Storage cluster can raise that show in the OpenShift Data Foundation user interface. These are defined as health checks which have unique identifiers. The identifier is a terse pseudo-human-readable string that is intended to enable tools to make sense of health checks, and present them in a way that reflects their meaning. Click the health code below for more information and troubleshooting.
| Health code | Description |
|---|---|
| One or more Ceph Monitors are low on disk space. |
6.2.1. MON_DISK_LOW Copy linkLink copied to clipboard!
This alert triggers if the available space on the file system storing the monitor database as a percentage, drops below mon_data_avail_warn (default: 15%). This may indicate that some other process or user on the system is filling up the same file system used by the monitor. It may also indicate that the monitor’s database is large.
The paths to the file system differ depending on the deployment of your mons. You can find the path to where the mon is deployed in storagecluster.yaml.
Example paths:
-
Mon deployed over PVC path:
/var/lib/ceph/mon -
Mon deployed over hostpath:
/var/lib/rook/mon
In order to clear up space, view the high usage files in the file system and choose which to delete. To view the files, run:
du -a <path-in-the-mon-node> |sort -n -r |head -n10
# du -a <path-in-the-mon-node> |sort -n -r |head -n10
Replace <path-in-the-mon-node> with the path to the file system where mons are deployed.
6.3. Resolving NooBaa Bucket Error State Copy linkLink copied to clipboard!
Procedure
- In the OpenShift Web Console, click Storage → Data Foundation.
- In the Status card of the Overview tab, click Storage System and then click the storage system link from the pop up that appears.
- Click the Object tab.
- In the Details card, click the link under System Name field.
- In the left pane, click Buckets option and search for the bucket in error state. If the bucket in error state is a namespace bucket, be sure to click the Namespace Buckets pane.
- Click on it’s Bucket Name. Error encountered in bucket is displayed.
Depending on the specific error of the bucket, perform one or both of the following:
For space related errors:
- In the left pane, click Resources option.
- Click on the resource in error state.
- Scale the resource by adding more agents.
For resource health errors:
- In the left pane, click Resources option.
- Click on the resource in error state.
- Connectivity error means the backing service is not available and needs to be restored.
- For access/permissions errors, update the connection’s Access Key and Secret Key.
6.4. Resolving NooBaa Bucket Exceeding Quota State Copy linkLink copied to clipboard!
To resolve A NooBaa Bucket Is In Exceeding Quota State error perform one of the following:
- Cleanup some of the data on the bucket.
Increase the bucket quota by performing the following steps:
- In the OpenShift Web Console, click Storage → Data Foundation.
- In the Status card of the Overview tab, click Storage System and then click the storage system link from the pop up that appears.
- Click the Object tab.
- In the Details card, click the link under System Name field.
- In the left pane, click Buckets option and search for the bucket in error state.
- Click on its Bucket Name. Error encountered in bucket is displayed.
- Click Bucket Policies → Edit Quota and increase the quota.
6.5. Resolving NooBaa Bucket Capacity or Quota State Copy linkLink copied to clipboard!
Procedure
- In the OpenShift Web Console, click Storage → Data Foundation.
- In the Status card of the Overview tab, click Storage System and then click the storage system link from the pop up that appears.
- Click the Object tab.
- In the Details card, click the link under System Name field.
- In the left pane, click the Resources option and search for the PV pool resource.
- For the PV pool resource with low capacity status, click on its Resource Name.
- Edit the pool configuration and increase the number of agents.
6.6. Recovering pods Copy linkLink copied to clipboard!
When a first node (say NODE1) goes to NotReady state because of some issue, the hosted pods that are using PVC with ReadWriteOnce (RWO) access mode try to move to the second node (say NODE2) but get stuck due to multi-attach error. In such a case, you can recover MON, OSD, and application pods by using the following steps.
Procedure
-
Power off
NODE1(from AWS or vSphere side) and ensure thatNODE1is completely down. Force delete the pods on
NODE1by using the following command:oc delete pod <pod-name> --grace-period=0 --force
$ oc delete pod <pod-name> --grace-period=0 --forceCopy to Clipboard Copied! Toggle word wrap Toggle overflow
6.7. Recovering from EBS volume detach Copy linkLink copied to clipboard!
When an OSD or MON elastic block storage (EBS) volume where the OSD disk resides is detached from the worker Amazon EC2 instance, the volume gets reattached automatically within one or two minutes. However, the OSD pod gets into a CrashLoopBackOff state. To recover and bring back the pod to Running state, you must restart the EC2 instance.
6.8. Enabling and disabling debug logs for rook-ceph-operator Copy linkLink copied to clipboard!
Enable the debug logs for the rook-ceph-operator to obtain information about failures that help in troubleshooting issues.
Procedure
- Enabling the debug logs
Edit the configmap of the rook-ceph-operator.
oc edit configmap rook-ceph-operator-config
$ oc edit configmap rook-ceph-operator-configCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the
ROOK_LOG_LEVEL: DEBUGparameter in therook-ceph-operator-configyaml file to enable the debug logs for rook-ceph-operator.… data: # The logging level for the operator: INFO | DEBUG ROOK_LOG_LEVEL: DEBUG
… data: # The logging level for the operator: INFO | DEBUG ROOK_LOG_LEVEL: DEBUGCopy to Clipboard Copied! Toggle word wrap Toggle overflow Now, the rook-ceph-operator logs consist of the debug information.
- Disabling the debug logs
Edit the configmap of the rook-ceph-operator.
oc edit configmap rook-ceph-operator-config
$ oc edit configmap rook-ceph-operator-configCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the
ROOK_LOG_LEVEL: INFOparameter in therook-ceph-operator-configyaml file to disable the debug logs for rook-ceph-operator.… data: # The logging level for the operator: INFO | DEBUG ROOK_LOG_LEVEL: INFO
… data: # The logging level for the operator: INFO | DEBUG ROOK_LOG_LEVEL: INFOCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 7. Checking for Local Storage Operator deployments Copy linkLink copied to clipboard!
Red Hat OpenShift Data Foundation clusters with Local Storage Operator are deployed using local storage devices. To find out if your existing cluster with OpenShift Data Foundation was deployed using local storage devices, use the following procedure:
Prerequisites
-
OpenShift Data Foundation is installed and running in the
openshift-storagenamespace.
Procedure
By checking the storage class associated with your OpenShift Data Foundation cluster’s persistent volume claims (PVCs), you can tell if your cluster was deployed using local storage devices.
Check the storage class associated with OpenShift Data Foundation cluster’s PVCs with the following command:
oc get pvc -n openshift-storage
$ oc get pvc -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the output. For clusters with Local Storage Operator, the PVCs associated with
ocs-devicesetuse the storage classlocalblock. The output looks similar to the following:NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-0 Bound pvc-d96c747b-2ab5-47e2-b07e-1079623748d8 50Gi RWO ocs-storagecluster-ceph-rbd 114s ocs-deviceset-0-0-lzfrd Bound local-pv-7e70c77c 1769Gi RWO localblock 2m10s ocs-deviceset-1-0-7rggl Bound local-pv-b19b3d48 1769Gi RWO localblock 2m10s ocs-deviceset-2-0-znhk8 Bound local-pv-e9f22cdc 1769Gi RWO localblock 2m10s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-0 Bound pvc-d96c747b-2ab5-47e2-b07e-1079623748d8 50Gi RWO ocs-storagecluster-ceph-rbd 114s ocs-deviceset-0-0-lzfrd Bound local-pv-7e70c77c 1769Gi RWO localblock 2m10s ocs-deviceset-1-0-7rggl Bound local-pv-b19b3d48 1769Gi RWO localblock 2m10s ocs-deviceset-2-0-znhk8 Bound local-pv-e9f22cdc 1769Gi RWO localblock 2m10sCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- Deploying OpenShift Data Foundation using local storage devices on VMware
- Deploying OpenShift Data Foundation using local storage devices on Red Hat Virtualization
- Deploying OpenShift Data Foundation using local storage devices on bare metal
- Deploying OpenShift Data Foundation using local storage devices on IBM Power
Chapter 8. Removing failed or unwanted Ceph Object Storage devices Copy linkLink copied to clipboard!
The failed or unwanted Ceph OSDs (Object Storage Devices) affects the performance of the storage infrastructure. Hence, to improve the reliability and resilience of the storage cluster, you must remove the failed or unwanted Ceph OSDs.
If you have any failed or unwanted Ceph OSDs to remove:
Verify the Ceph health status.
For more information see: Verifying Ceph cluster is healthy.
Based on the provisioning of the OSDs, remove failed or unwanted Ceph OSDs.
See:
If you are using local disks, you can reuse these disks after removing the old OSDs.
8.1. Verifying Ceph cluster is healthy Copy linkLink copied to clipboard!
Storage health is visible on the Block and File and Object dashboards.
Procedure
- In the OpenShift Web Console, click Storage → Data Foundation.
- In the Status card of the Overview tab, click Storage System and then click the storage system link from the pop up that appears.
- In the Status card of the Block and File tab, verify that Storage Cluster has a green tick.
- In the Details card, verify that the cluster information is displayed.
8.2. Removing failed or unwanted Ceph OSDs in dynamically provisioned Red Hat OpenShift Data Foundation Copy linkLink copied to clipboard!
Follow the steps in the procedure to remove the failed or unwanted Ceph OSDs in dynamically provisioned Red Hat OpenShift Data Foundation.
Scaling down of cluster is supported only with the help of the Red Hat support team.
- Removing an OSD when the Ceph component is not in a healthy state can result in data loss.
- Removing two or more OSDs at the same time results in data loss.
Prerequisites
- Check if Ceph is healthy. For more information see Verifying Ceph cluster is healthy.
- Ensure no alerts are firing or any rebuilding process is in progress.
Procedure
Scale down the OSD deployment.
oc scale deployment rook-ceph-osd-<osd-id> --replicas=0
# oc scale deployment rook-ceph-osd-<osd-id> --replicas=0Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the
osd-preparepod for the Ceph OSD to be removed.oc get deployment rook-ceph-osd-<osd-id> -oyaml | grep ceph.rook.io/pvc
# oc get deployment rook-ceph-osd-<osd-id> -oyaml | grep ceph.rook.io/pvcCopy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the
osd-preparepod.oc delete -n openshift-storage pod rook-ceph-osd-prepare-<pvc-from-above-command>-<pod-suffix>
# oc delete -n openshift-storage pod rook-ceph-osd-prepare-<pvc-from-above-command>-<pod-suffix>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the failed OSD from the cluster.
failed_osd_id=<osd-id> oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=$<failed_osd_id> | oc create -f -
# failed_osd_id=<osd-id> # oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=$<failed_osd_id> | oc create -f -Copy to Clipboard Copied! Toggle word wrap Toggle overflow where,
FAILED_OSD_IDis the integer in the pod name immediately after therook-ceph-osdprefix.Verify that the OSD is removed successfully by checking the logs.
oc logs -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>
# oc logs -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Optional: If you get an error as
cephosd:osd.0 is NOT ok to destroyfrom the ocs-osd-removal-job pod in OpenShift Container Platform, see Troubleshooting the errorcephosd:osd.0 is NOT ok to destroywhile removing failed or unwanted Ceph OSDs. Delete the OSD deployment.
oc delete deployment rook-ceph-osd-<osd-id>
# oc delete deployment rook-ceph-osd-<osd-id>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification step
To check if the OSD is deleted successfully, run:
oc get pod -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>
# oc get pod -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command must return the status as Completed.
8.3. Removing failed or unwanted Ceph OSDs provisioned using local storage devices Copy linkLink copied to clipboard!
You can remove failed or unwanted Ceph provisioned using local storage devices by following the steps in the procedure.
Scaling down of cluster is supported only with the help of the Red Hat support team.
- Removing an OSD when the Ceph component is not in a healthy state can result in data loss.
- Removing two or more OSDs at the same time results in data loss.
Prerequisites
- Check if Ceph is healthy. For more information see Verifying Ceph cluster is healthy.
- Ensure no alerts are firing or any rebuilding process is in progress.
Procedure
Forcibly, mark the OSD down by scaling the replicas on the OSD deployment to 0. You can skip this step if the OSD is already down due to failure.
oc scale deployment rook-ceph-osd-<osd-id> --replicas=0
# oc scale deployment rook-ceph-osd-<osd-id> --replicas=0Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the failed OSD from the cluster.
failed_osd_id=<osd_id> oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=$<failed_osd_id> | oc create -f -
# failed_osd_id=<osd_id> # oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=$<failed_osd_id> | oc create -f -Copy to Clipboard Copied! Toggle word wrap Toggle overflow where,
FAILED_OSD_IDis the integer in the pod name immediately after therook-ceph-osdprefix.Verify that the OSD is removed successfully by checking the logs.
oc logs -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>
# oc logs -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Optional: If you get an error as
cephosd:osd.0 is NOT ok to destroyfrom the ocs-osd-removal-job pod in OpenShift Container Platform, see Troubleshooting the errorcephosd:osd.0 is NOT ok to destroywhile removing failed or unwanted Ceph OSDs. Delete persistent volume claim (PVC) resources associated with the failed OSD.
Get the
PVCassociated with the failed OSD.oc get -n openshift-storage -o yaml deployment rook-ceph-osd-<osd-id> | grep ceph.rook.io/pvc
# oc get -n openshift-storage -o yaml deployment rook-ceph-osd-<osd-id> | grep ceph.rook.io/pvcCopy to Clipboard Copied! Toggle word wrap Toggle overflow Get the
persistent volume(PV) associated with the PVC.oc get -n openshift-storage pvc <pvc-name>
# oc get -n openshift-storage pvc <pvc-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the failed device name.
oc get pv <pv-name-from-above-command> -oyaml | grep path
# oc get pv <pv-name-from-above-command> -oyaml | grep pathCopy to Clipboard Copied! Toggle word wrap Toggle overflow Get the
prepare-podassociated with the failed OSD.oc describe -n openshift-storage pvc ocs-deviceset-0-0-nvs68 | grep Mounted
# oc describe -n openshift-storage pvc ocs-deviceset-0-0-nvs68 | grep MountedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the
osd-prepare podbefore removing the associated PVC.oc delete -n openshift-storage pod <osd-prepare-pod-from-above-command>
# oc delete -n openshift-storage pod <osd-prepare-pod-from-above-command>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the
PVCassociated with the failed OSD.oc delete -n openshift-storage pvc <pvc-name-from-step-a>
# oc delete -n openshift-storage pvc <pvc-name-from-step-a>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Remove failed device entry from the
LocalVolume custom resource(CR).Log in to node with the failed device.
oc debug node/<node_with_failed_osd>
# oc debug node/<node_with_failed_osd>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Record the /dev/disk/by-id/<id> for the failed device name.
ls -alh /mnt/local-storage/localblock/
# ls -alh /mnt/local-storage/localblock/Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Optional: In case, Local Storage Operator is used for provisioning OSD, login to the machine with {osd-id} and remove the device symlink.
oc debug node/<node_with_failed_osd>
# oc debug node/<node_with_failed_osd>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the OSD symlink for the failed device name.
ls -alh /mnt/local-storage/localblock
# ls -alh /mnt/local-storage/localblockCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the symlink.
rm /mnt/local-storage/localblock/<failed-device-name>
# rm /mnt/local-storage/localblock/<failed-device-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Delete the PV associated to the OSD.
oc delete pv <pv-name>
# oc delete pv <pv-name>
Verification step
To check if the OSD is deleted successfully, run:
#oc get pod -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>
#oc get pod -n openshift-storage ocs-osd-removal-$<failed_osd_id>-<pod-suffix>Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command must return the status as Completed.
8.4. Troubleshooting the error cephosd:osd.0 is NOT ok to destroy while removing failed or unwanted Ceph OSDs Copy linkLink copied to clipboard!
If you get an error as cephosd:osd.0 is NOT ok to destroy from the ocs-osd-removal-job pod in OpenShift Container Platform, run the OSD removal job with FORCE_OSD_REMOVAL option to move the OSD to a destroyed state.
oc process -n openshift-storage ocs-osd-removal -p FORCE_OSD_REMOVAL=true -p FAILED_OSD_IDS=$<failed_osd_id> | oc create -f -
# oc process -n openshift-storage ocs-osd-removal -p FORCE_OSD_REMOVAL=true -p FAILED_OSD_IDS=$<failed_osd_id> | oc create -f -
You must use the FORCE_OSD_REMOVAL option only if all the PGs are in active state. If not, PGs must either complete the back filling or further investigated to ensure they are active.
Chapter 9. Troubleshooting and deleting remaining resources during Uninstall Copy linkLink copied to clipboard!
Occasionally some of the custom resources managed by an operator may remain in "Terminating" status waiting on the finalizer to complete, although you have performed all the required cleanup tasks. In such an event you need to force the removal of such resources. If you do not do so, the resources remain in the "Terminating" state even after you have performed all the uninstall steps.
Check if the openshift-storage namespace is stuck in Terminating state upon deletion.
oc get project -n <namespace>
$ oc get project -n <namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Output:
NAME DISPLAY NAME STATUS openshift-storage Terminating
NAME DISPLAY NAME STATUS openshift-storage TerminatingCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check for the
NamespaceFinalizersRemainingandNamespaceContentRemainingmessages in theSTATUSsection of the command output and perform the next step for each of the listed resources.oc get project openshift-storage -o yaml
$ oc get project openshift-storage -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output :
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete all the remaining resources listed in the previous step.
For each of the resources to be deleted, do the following:
Get the object kind of the resource which needs to be removed. See the message in the above output.
Example :
message: Some content in the namespace has finalizers remaining: cephobjectstoreuser.ceph.rook.ioHere cephobjectstoreuser.ceph.rook.io is the object kind.
Get the Object name corresponding to the object kind.
oc get <Object-kind> -n <project-name>
$ oc get <Object-kind> -n <project-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example :
oc get cephobjectstoreusers.ceph.rook.io -n openshift-storage
$ oc get cephobjectstoreusers.ceph.rook.io -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
NAME AGE noobaa-ceph-objectstore-user 26h
NAME AGE noobaa-ceph-objectstore-user 26hCopy to Clipboard Copied! Toggle word wrap Toggle overflow Patch the resources.
oc patch -n <project-name> <object-kind>/<object-name> --type=merge -p '{"metadata": {"finalizers":null}}'$ oc patch -n <project-name> <object-kind>/<object-name> --type=merge -p '{"metadata": {"finalizers":null}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example:
oc patch -n openshift-storage cephobjectstoreusers.ceph.rook.io/noobaa-ceph-objectstore-user \ --type=merge -p '{"metadata": {"finalizers":null}}'$ oc patch -n openshift-storage cephobjectstoreusers.ceph.rook.io/noobaa-ceph-objectstore-user \ --type=merge -p '{"metadata": {"finalizers":null}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Output:
cephobjectstoreuser.ceph.rook.io/noobaa-ceph-objectstore-user patched
cephobjectstoreuser.ceph.rook.io/noobaa-ceph-objectstore-user patchedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verify that the openshift-storage project is deleted.
oc get project openshift-storage
$ oc get project openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Output:
Error from server (NotFound): namespaces "openshift-storage" not found
Error from server (NotFound): namespaces "openshift-storage" not foundCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the issue persists, reach out to Red Hat Support.
Chapter 10. Troubleshooting CephFS PVC creation in external mode Copy linkLink copied to clipboard!
If you have updated the Red Hat Ceph Storage cluster from a version lower than 4.1.1 to the latest release and is not a freshly deployed cluster, you must manually set the application type for CephFS pool on the Red Hat Ceph Storage cluster to enable CephFS PVC creation in external mode.
Check for CephFS pvc stuck in
Pendingstatus.oc get pvc -n <namespace>
# oc get pvc -n <namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output :
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ngx-fs-pxknkcix20-pod Pending ocs-external-storagecluster-cephfs 28h [...]NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ngx-fs-pxknkcix20-pod Pending ocs-external-storagecluster-cephfs 28h [...]Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
describeoutput to see the events for respective pvc.Expected error message is
cephfs_metadata/csi.volumes.default/csi.volume.pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx: (1) Operation not permitted)oc describe pvc ngx-fs-pxknkcix20-pod -n nginx-file
# oc describe pvc ngx-fs-pxknkcix20-pod -n nginx-fileCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the settings for the
<cephfs metadata pool name>(herecephfs_metadata) and<cephfs data pool name>(herecephfs_data). For running the command, you will needjqpreinstalled in the Red Hat Ceph Storage client node.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the application type for CephFS pool.
Run the following commands on the Red Hat Ceph Storage client node :
ceph osd pool application set <cephfs metadata pool name> cephfs metadata cephfs
# ceph osd pool application set <cephfs metadata pool name> cephfs metadata cephfsCopy to Clipboard Copied! Toggle word wrap Toggle overflow ceph osd pool application set <cephfs data pool name> cephfs data cephfs
# ceph osd pool application set <cephfs data pool name> cephfs data cephfsCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verify if the settings are applied.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the CephFS PVC status again. The PVC should now be in
Boundstate.oc get pvc -n <namespace>
# oc get pvc -n <namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output :
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ngx-fs-pxknkcix20-pod Bound pvc-1ac0c6e6-9428-445d-bbd6-1284d54ddb47 1Mi RWO ocs-external-storagecluster-cephfs 29h [...]
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ngx-fs-pxknkcix20-pod Bound pvc-1ac0c6e6-9428-445d-bbd6-1284d54ddb47 1Mi RWO ocs-external-storagecluster-cephfs 29h [...]Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 11. Restoring the monitor pods in OpenShift Data Foundation Copy linkLink copied to clipboard!
Restore the monitor pods if all three of them go down, and when OpenShift Data Foundation is not able to recover the monitor pods automatically.
Procedure
Scale down the
rook-ceph-operatorandocs operatordeployments.oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage
# oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc scale deployment ocs-operator --replicas=0 -n openshift-storage
# oc scale deployment ocs-operator --replicas=0 -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a backup of all deployments in
openshift-storagenamespace.mkdir backup
# mkdir backupCopy to Clipboard Copied! Toggle word wrap Toggle overflow cd backup
# cd backupCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc project openshift-storage
# oc project openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow for d in $(oc get deployment|awk -F' ' '{print $1}'|grep -v NAME); do echo $d;oc get deployment $d -o yaml > oc_get_deployment.${d}.yaml; done# for d in $(oc get deployment|awk -F' ' '{print $1}'|grep -v NAME); do echo $d;oc get deployment $d -o yaml > oc_get_deployment.${d}.yaml; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow Patch the OSD deployments to remove the
livenessProbeparameter, and run it with the command parameter assleep.for i in $(oc get deployment -l app=rook-ceph-osd -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "osd", "command": ["sleep", "infinity"], "args": []}]}}}}' ; done# for i in $(oc get deployment -l app=rook-ceph-osd -oname);do oc patch ${i} -n openshift-storage --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' ; oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "osd", "command": ["sleep", "infinity"], "args": []}]}}}}' ; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow Retrieve the
monstorecluster map from all the OSDs.Create the
recover_mon.shscript.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the
recover_mon.shscript.chmod +x recover_mon.sh
# chmod +x recover_mon.shCopy to Clipboard Copied! Toggle word wrap Toggle overflow ./recover_mon.sh
# ./recover_mon.shCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Patch the MON deployments, and run it with the command parameter as
sleep.Edit the MON deployments.
for i in $(oc get deployment -l app=rook-ceph-mon -oname);do oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'; done# for i in $(oc get deployment -l app=rook-ceph-mon -oname);do oc patch ${i} -n openshift-storage -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow Patch the MON deployments to increase the
initialDelaySeconds.oc get deployment rook-ceph-mon-a -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
# oc get deployment rook-ceph-mon-a -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get deployment rook-ceph-mon-b -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
# oc get deployment rook-ceph-mon-b -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get deployment rook-ceph-mon-c -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -
# oc get deployment rook-ceph-mon-c -o yaml | sed "s/initialDelaySeconds: 10/initialDelaySeconds: 2000/g" | oc replace -f -Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Copy the previously retrieved
monstoreto the mon-a pod.oc cp /tmp/monstore/ $(oc get po -l app=rook-ceph-mon,mon=a -oname |sed 's/pod\///g'):/tmp/
# oc cp /tmp/monstore/ $(oc get po -l app=rook-ceph-mon,mon=a -oname |sed 's/pod\///g'):/tmp/Copy to Clipboard Copied! Toggle word wrap Toggle overflow Navigate into the MON pod and change the ownership of the retrieved
monstore.oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
# oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)Copy to Clipboard Copied! Toggle word wrap Toggle overflow chown -R ceph:ceph /tmp/monstore
# chown -R ceph:ceph /tmp/monstoreCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the keyring template file before rebuilding the
mon db.oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
# oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)Copy to Clipboard Copied! Toggle word wrap Toggle overflow cp /etc/ceph/keyring-store/keyring /tmp/keyring
# cp /etc/ceph/keyring-store/keyring /tmp/keyringCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow Identify the keyring of all other Ceph daemons (MGR, MDS, RGW, Crash, CSI and CSI provisioners) from its respective secrets.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example keyring file,
/etc/ceph/ceph.client.admin.keyring:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Important-
For
client.csirelated keyring, refer to the previous keyring file output and add the defaultcapsafter fetching the key from its respective OpenShift Data Foundation secret. - OSD keyring is added automatically post recovery.
-
For
Navigate into the mon-a pod, and verify that the
monstorehasmonmap.Navigate into the mon-a pod.
oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)
# oc rsh $(oc get po -l app=rook-ceph-mon,mon=a -oname)Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the
monstorehasmonmap.ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmap
# ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow monmaptool /tmp/monmap --print
# monmaptool /tmp/monmap --printCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Optional: If the
monmapis missing then create a newmonmap.monmaptool --create --add <mon-a-id> <mon-a-ip> --add <mon-b-id> <mon-b-ip> --add <mon-c-id> <mon-c-ip> --enable-all-features --clobber /root/monmap --fsid <fsid>
# monmaptool --create --add <mon-a-id> <mon-a-ip> --add <mon-b-id> <mon-b-ip> --add <mon-c-id> <mon-c-ip> --enable-all-features --clobber /root/monmap --fsid <fsid>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <mon-a-id>- Is the ID of the mon-a pod.
<mon-a-ip>- Is the IP address of the mon-a pod.
<mon-b-id>- Is the ID of the mon-b pod.
<mon-b-ip>- Is the IP address of the mon-b pod.
<mon-c-id>- Is the ID of the mon-c pod.
<mon-c-ip>- Is the IP address of the mon-c pod.
<fsid>- Is the file system ID.
Verify the
monmap.monmaptool /root/monmap --print
# monmaptool /root/monmap --printCopy to Clipboard Copied! Toggle word wrap Toggle overflow Import the
monmap.ImportantUse the previously created keyring file.
ceph-monstore-tool /tmp/monstore rebuild -- --keyring /tmp/keyring --monmap /root/monmap
# ceph-monstore-tool /tmp/monstore rebuild -- --keyring /tmp/keyring --monmap /root/monmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow chown -R ceph:ceph /tmp/monstore
# chown -R ceph:ceph /tmp/monstoreCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a backup of the old
store.dbfile.mv /var/lib/ceph/mon/ceph-a/store.db /var/lib/ceph/mon/ceph-a/store.db.corrupted
# mv /var/lib/ceph/mon/ceph-a/store.db /var/lib/ceph/mon/ceph-a/store.db.corruptedCopy to Clipboard Copied! Toggle word wrap Toggle overflow mv /var/lib/ceph/mon/ceph-b/store.db /var/lib/ceph/mon/ceph-b/store.db.corrupted
# mv /var/lib/ceph/mon/ceph-b/store.db /var/lib/ceph/mon/ceph-b/store.db.corruptedCopy to Clipboard Copied! Toggle word wrap Toggle overflow mv /var/lib/ceph/mon/ceph-c/store.db /var/lib/ceph/mon/ceph-c/store.db.corrupted
# mv /var/lib/ceph/mon/ceph-c/store.db /var/lib/ceph/mon/ceph-c/store.db.corruptedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the rebuild
store.dbfile to themonstoredirectory.mv /tmp/monstore/store.db /var/lib/ceph/mon/ceph-a/store.db
# mv /tmp/monstore/store.db /var/lib/ceph/mon/ceph-a/store.dbCopy to Clipboard Copied! Toggle word wrap Toggle overflow chown -R ceph:ceph /var/lib/ceph/mon/ceph-a/store.db
# chown -R ceph:ceph /var/lib/ceph/mon/ceph-a/store.dbCopy to Clipboard Copied! Toggle word wrap Toggle overflow After rebuilding the
monstoredirectory, copy thestore.dbfile from local to the rest of the MON pods.oc cp $(oc get po -l app=rook-ceph-mon,mon=a -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-a/store.db /tmp/store.db
# oc cp $(oc get po -l app=rook-ceph-mon,mon=a -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-a/store.db /tmp/store.dbCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc cp /tmp/store.db $(oc get po -l app=rook-ceph-mon,mon=<id> -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-<id>
# oc cp /tmp/store.db $(oc get po -l app=rook-ceph-mon,mon=<id> -oname | sed 's/pod\///g'):/var/lib/ceph/mon/ceph-<id>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <id>- Is the ID of the MON pod
Navigate into the rest of the MON pods and change the ownership of the copied
monstore.oc rsh $(oc get po -l app=rook-ceph-mon,mon=<id> -oname)
# oc rsh $(oc get po -l app=rook-ceph-mon,mon=<id> -oname)Copy to Clipboard Copied! Toggle word wrap Toggle overflow chown -R ceph:ceph /var/lib/ceph/mon/ceph-<id>/store.db
# chown -R ceph:ceph /var/lib/ceph/mon/ceph-<id>/store.dbCopy to Clipboard Copied! Toggle word wrap Toggle overflow <id>- Is the ID of the MON pod
Revert the patched changes.
For MON deployments:
oc replace --force -f <mon-deployment.yaml>
# oc replace --force -f <mon-deployment.yaml>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <mon-deployment.yaml>- Is the MON deployment yaml file
For OSD deployments:
oc replace --force -f <osd-deployment.yaml>
# oc replace --force -f <osd-deployment.yaml>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <osd-deployment.yaml>- Is the OSD deployment yaml file
For MGR deployments:
oc replace --force -f <mgr-deployment.yaml>
# oc replace --force -f <mgr-deployment.yaml>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <mgr-deployment.yaml>Is the MGR deployment yaml file
ImportantEnsure that the MON, MGR and OSD pods are up and running.
Scale up the
rook-ceph-operatorandocs-operatordeployments.oc -n openshift-storage scale deployment ocs-operator --replicas=1
# oc -n openshift-storage scale deployment ocs-operator --replicas=1Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification steps
Check the Ceph status to confirm that CephFS is running.
ceph -s
# ceph -sCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the Multicloud Object Gateway (MCG) status. It should be active, and the backingstore and bucketclass should be in
Readystate.noobaa status -n openshift-storage
noobaa status -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantIf the MCG is not in the active state, and the backingstore and bucketclass not in the
Readystate, you need to restart all the MCG related pods. For more information, see Section 11.1, “Restoring the Multicloud Object Gateway”.
11.1. Restoring the Multicloud Object Gateway Copy linkLink copied to clipboard!
If the Multicloud Object Gateway (MCG) is not in the active state, and the backingstore and bucketclass is not in the Ready state, you need to restart all the MCG related pods, and check the MCG status to confirm that the MCG is back up and running.
Procedure
Restart all the pods related to the MCG.
oc delete pods <noobaa-operator> -n openshift-storage
# oc delete pods <noobaa-operator> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete pods <noobaa-core> -n openshift-storage
# oc delete pods <noobaa-core> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete pods <noobaa-endpoint> -n openshift-storage
# oc delete pods <noobaa-endpoint> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete pods <noobaa-db> -n openshift-storage
# oc delete pods <noobaa-db> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow <noobaa-operator>- Is the name of the MCG operator
<noobaa-core>- Is the name of the MCG core pod
<noobaa-endpoint>- Is the name of the MCG endpoint
<noobaa-db>- Is the name of the MCG db pod
If the RADOS Object Gateway (RGW) is configured, restart the pod.
oc delete pods <rgw-pod> -n openshift-storage
# oc delete pods <rgw-pod> -n openshift-storageCopy to Clipboard Copied! Toggle word wrap Toggle overflow <rgw-pod>- Is the name of the RGW pod
Chapter 12. Restoring ceph-monitor quorum in OpenShift Data Foundation Copy linkLink copied to clipboard!
In some circumstances, the ceph-mons might lose quorum. If the mons cannot form quorum again, there is a manual procedure to get the quorum going again. The only requirement is that, at least one mon must be healthy. The following steps removes the unhealthy mons from quorum and enables you to form a quorum again with a single mon, then bring the quorum back to the original size.
For example, if you have three mons and lose quorum, you need to remove the two bad mons from quorum, notify the good mon that it is the only mon in quorum, and then restart the good mon.
Procedure
Stop the
rook-ceph-operatorso that themonsare not failed over when you are modifying themonmap.oc -n openshift-storage scale deployment rook-ceph-operator --replicas=0
# oc -n openshift-storage scale deployment rook-ceph-operator --replicas=0Copy to Clipboard Copied! Toggle word wrap Toggle overflow Inject a new
monmap.WarningYou must inject the
monmapvery carefully. If run incorrectly, your cluster could be permanently destroyed. The Cephmonmapkeeps track of themonquorum. Themonmapis updated to only contain the healthy mon. In this example, the healthy mon isrook-ceph-mon-b, while the unhealthymonsarerook-ceph-mon-aandrook-ceph-mon-c.Take a backup of the current
rook-ceph-mon-bDeployment:oc -n openshift-storage get deployment rook-ceph-mon-b -o yaml > rook-ceph-mon-b-deployment.yaml
# oc -n openshift-storage get deployment rook-ceph-mon-b -o yaml > rook-ceph-mon-b-deployment.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Open the YAML file and copy the command and arguments from the
moncontainer (see containers list in the following example). This is needed for themonmapchanges.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Cleanup the copied
commandandargsfields to form a pastable command as follows:Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteMake sure to remove the single quotes around the
--log-stderr-prefixflag and the parenthesis around the variables being passedROOK_CEPH_MON_HOST,ROOK_CEPH_MON_INITIAL_MEMBERSandROOK_POD_IP).Patch the
rook-ceph-mon-bDeployment to stop the working of thismonwithout deleting themonpod.oc -n openshift-storage patch deployment rook-ceph-mon-b --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' oc -n openshift-storage patch deployment rook-ceph-mon-b -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'# oc -n openshift-storage patch deployment rook-ceph-mon-b --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' # oc -n openshift-storage patch deployment rook-ceph-mon-b -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Perform the following steps on the
mon-bpod:Connect to the pod of a healthy
monand run the following commands:oc -n openshift-storage exec -it <mon-pod> bash
# oc -n openshift-storage exec -it <mon-pod> bashCopy to Clipboard Copied! Toggle word wrap Toggle overflow Set the variable.
monmap_path=/tmp/monmap
# monmap_path=/tmp/monmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow Extract the
monmapto a file, by pasting the cephmoncommand from the goodmondeployment and adding the--extract-monmap=${monmap_path}flag.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Review the contents of the
monmap.monmaptool --print /tmp/monmap
# monmaptool --print /tmp/monmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the bad
monsfrom themonmap.monmaptool ${monmap_path} --rm <bad_mon># monmaptool ${monmap_path} --rm <bad_mon>Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example we remove
mon0andmon2:monmaptool ${monmap_path} --rm a monmaptool ${monmap_path} --rm c# monmaptool ${monmap_path} --rm a # monmaptool ${monmap_path} --rm cCopy to Clipboard Copied! Toggle word wrap Toggle overflow Inject the modified
monmapinto the goodmon, by pasting the cephmoncommand and adding the--inject-monmap=${monmap_path}flag as follows:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Exit the shell to continue.
Edit the Rook
configmaps.Edit the
configmapthat the operator uses to track themons.oc -n openshift-storage edit configmap rook-ceph-mon-endpoints
# oc -n openshift-storage edit configmap rook-ceph-mon-endpointsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that in the data element you see three
monssuch as the following (or more depending on yourmoncount):data: a=10.100.35.200:6789;b=10.100.13.242:6789;c=10.100.35.12:6789
data: a=10.100.35.200:6789;b=10.100.13.242:6789;c=10.100.35.12:6789Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the bad
monsfrom the list to end up with a single goodmon. For example:data: b=10.100.13.242:6789
data: b=10.100.13.242:6789Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Save the file and exit.
Now, you need to adapt a
Secretwhich is used for themonsand other components.Set a value for the variable
good_mon_id.For example:
good_mon_id=b
# good_mon_id=bCopy to Clipboard Copied! Toggle word wrap Toggle overflow You can use the
oc patchcommand to patch therook-ceph-configsecret and update the two key/value pairsmon_hostandmon_initial_members.mon_host=$(oc -n openshift-storage get svc rook-ceph-mon-b -o jsonpath='{.spec.clusterIP}') oc -n openshift-storage patch secret rook-ceph-config -p '{"stringData": {"mon_host": "[v2:'"${mon_host}"':3300,v1:'"${mon_host}"':6789]", "mon_initial_members": "'"${good_mon_id}"'"}}'# mon_host=$(oc -n openshift-storage get svc rook-ceph-mon-b -o jsonpath='{.spec.clusterIP}') # oc -n openshift-storage patch secret rook-ceph-config -p '{"stringData": {"mon_host": "[v2:'"${mon_host}"':3300,v1:'"${mon_host}"':6789]", "mon_initial_members": "'"${good_mon_id}"'"}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf you are using
hostNetwork: true, you need to replace themon_hostvar with the node IP themonis pinned to (nodeSelector). This is because there is norook-ceph-mon-*service created in that “mode”.
Restart the
mon.You need to restart the good
monpod with the originalceph-moncommand to pick up the changes.Use the
oc replacecommand on the backup of themondeployment YAML file:oc replace --force -f rook-ceph-mon-b-deployment.yaml
# oc replace --force -f rook-ceph-mon-b-deployment.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteOption
--forcedeletes the deployment and creates a new one.Verify the status of the cluster.
The status should show one
monin quorum. If the status looks good, your cluster should be healthy again.
Delete the two mon deployments that are no longer expected to be in quorum.
For example:
oc delete deploy <rook-ceph-mon-1> oc delete deploy <rook-ceph-mon-2>
# oc delete deploy <rook-ceph-mon-1> # oc delete deploy <rook-ceph-mon-2>Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example the deployments to be deleted are
rook-ceph-mon-aandrook-ceph-mon-c.Restart the operator.
Start the rook operator again to resume monitoring the health of the cluster.
NoteIt is safe to ignore the errors that a number of resources already exist.
oc -n openshift-storage scale deployment rook-ceph-operator --replicas=1
# oc -n openshift-storage scale deployment rook-ceph-operator --replicas=1Copy to Clipboard Copied! Toggle word wrap Toggle overflow The operator automatically adds more
monsto increase the quorum size again depending on themoncount.
Chapter 13. Enabling the Red Hat OpenShift Data Foundation console plugin Copy linkLink copied to clipboard!
Enable the console plugin option if it was not automatically enabled after you installed the OpenShift Data Foundation Operator. The console plugin provides a custom interface that is included in the Web Console. You can enable the console plugin option either from the graphical user interface (GUI) or command-line interface.
Prerequisites
- You have administrative access to the OpenShift Web Console.
-
OpenShift Data Foundation Operator is installed and running in the
openshift-storagenamespace.
Procedure
- From user interface
- In the OpenShift Web Console, click Operators → Installed Operators to view all the installed operators.
-
Ensure that the Project selected is
openshift-storage. - Click on the OpenShift Data Foundation operator.
Enable the console plugin option.
- In the Details tab, click the pencil icon under Console plugin.
- Select Enable, and click Save.
- From command-line interface
Execute the following command to enable the console plugin option:
oc patch console.operator cluster -n openshift-storage --type json -p '[{"op": "add", "path": "/spec/plugins", "value": ["odf-console"]}]'$ oc patch console.operator cluster -n openshift-storage --type json -p '[{"op": "add", "path": "/spec/plugins", "value": ["odf-console"]}]'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification steps
After the console plugin option is enabled, a pop-up with a message,
Web console update is availableappears on the GUI. Click Refresh web console from this pop-up for the console changes to reflect.- In the Web Console, navigate to Storage and verify if Data Foundation is available.
Chapter 14. Changing resources for the OpenShift Data Foundation components Copy linkLink copied to clipboard!
When you install OpenShift Data Foundation, it comes with pre-defined resources that the OpenShift Data Foundation pods can consume. In some situations with higher I/O load, it might be required to increase these limits.
- To change the CPU and memory resources on the rook-ceph pods, see Section 14.1, “Changing the CPU and memory resources on the rook-ceph pods”.
- To tune the resources for the Multicloud Object Gateway (MCG), see Section 14.2, “Tuning the resources for the MCG”.
14.1. Changing the CPU and memory resources on the rook-ceph pods Copy linkLink copied to clipboard!
When you install OpenShift Data Foundation, it comes with pre-defined CPU and memory resources for the rook-ceph pods. You can manually increase these values according to the requirements.
You can change the CPU and memory resources on the following pods:
-
mgr -
mds -
rgw
The following example illustrates how to change the CPU and memory resources on the rook-ceph pods. In this example, the existing MDS pod values of cpu and memory are increased from 1 and 4Gi to 2 and 8Gi respectively.
Edit the storage cluster:
oc edit storagecluster -n openshift-storage <storagecluster_name>
# oc edit storagecluster -n openshift-storage <storagecluster_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow <storagecluster_name>- Specify the name of the storage cluster.
Example 14.1. Example
oc edit storagecluster -n openshift-storage ocs-storagecluster
# oc edit storagecluster -n openshift-storage ocs-storageclusterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the following lines to the storage cluster Custom Resource (CR):
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Save the changes and exit the editor.
Alternatively, run the
oc patchcommand to change the CPU and memory value of themdspod:oc patch -n openshift-storage storagecluster <storagecluster_name> --type merge \ --patch '{"spec": {"resources": {"mds": {"limits": {"cpu": "2","memory": "8Gi"},"requests": {"cpu": "2","memory": "8Gi"}}}}}'# oc patch -n openshift-storage storagecluster <storagecluster_name> --type merge \ --patch '{"spec": {"resources": {"mds": {"limits": {"cpu": "2","memory": "8Gi"},"requests": {"cpu": "2","memory": "8Gi"}}}}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow <storagecluster_name>- Specify the name of the storage cluster.
Example 14.2. Example
oc patch -n openshift-storage storagecluster ocs-storagecluster \ --type merge \ --patch '{"spec": {"resources": {"mds": {"limits": {"cpu": "2","memory": "8Gi"},"requests": {"cpu": "2","memory": "8Gi"}}}}}'# oc patch -n openshift-storage storagecluster ocs-storagecluster \ --type merge \ --patch '{"spec": {"resources": {"mds": {"limits": {"cpu": "2","memory": "8Gi"},"requests": {"cpu": "2","memory": "8Gi"}}}}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
14.2. Tuning the resources for the MCG Copy linkLink copied to clipboard!
The default configuration for the Multicloud Object Gateway (MCG) is optimized for low resource consumption and not performance. For more information on how to tune the resources for the MCG, see the Red Hat Knowledgebase solution Performance tuning guide for Multicloud Object Gateway (NooBaa).