Chapter 12. Reclaiming space on target volumes
The deleted files or chunks of zero data sometimes take up storage space on the Ceph cluster resulting in inaccurate reporting of the available storage space. The reclaim space operation removes such discrepancies by executing the following operations on the target volume:
-
fstrim
- This operation is used on volumes that are inFilesystem
mode and only if the volume is mounted to a pod at the time of execution of reclaim space operation. -
rbd sparsify
- This operation is used when the volume is not attached to any pods and reclaims the space occupied by chunks of 4M-sized zeroed data.
- Only the Ceph RBD volumes support the reclaim space operation.
- The reclaim space operation involves a performance penalty when it is being executed.
You can use one of the following methods to reclaim the space:
- Enabling reclaim space operation using annotating PersistentVolumeClaims (Recommended method to use for enabling reclaim space operation)
- Enabling reclaim space operation using ReclaimSpaceJob
- Enabling reclaim space operation using ReclaimSpaceCronJob
12.1. Enabling reclaim space operation by annotating PersistentVolumeClaims
Use this procedure to automatically invoke the reclaim space operation to annotate persistent volume claim (PVC) based on a given schedule.
-
The schedule value is in the same format as the Kubernetes CronJobs which sets the
and/or
interval of the recurring operation request. -
Recommended schedule interval is
@weekly
. If the schedule interval value is empty or in an invalid format, then the default schedule value is set to@weekly
. Do not schedule multipleReclaimSpace
operations@weekly
or at the same time. -
Minimum supported interval between each scheduled operation is at least 24 hours. For example,
@daily
(At 00:00 every day) or0 3 * * *
(At 3:00 every day). -
Schedule the
ReclaimSpace
operation during off-peak, maintenance window, or the interval when the workloadinput/output
is expected to be low. -
ReclaimSpaceCronJob
is recreated when theschedule
is modified. It is automatically deleted when the annotation is removed.
Procedure
Get the PVC details.
$ oc get pvc data-pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-pvc Bound pvc-f37b8582-4b04-4676-88dd-e1b95c6abf74 1Gi RWO ocs-storagecluster-ceph-rbd 20h
Add annotation
reclaimspace.csiaddons.openshift.io/schedule=@monthly
to the PVC to createreclaimspacecronjob
.$ oc annotate pvc data-pvc "reclaimspace.csiaddons.openshift.io/schedule=@monthly"
persistentvolumeclaim/data-pvc annotated
Verify that
reclaimspacecronjob
is created in the format,"<pvc-name>-xxxxxxx"
.$ oc get reclaimspacecronjobs.csiaddons.openshift.io
NAME SCHEDULE SUSPEND ACTIVE LASTSCHEDULE AGE data-pvc-1642663516 @monthly 3s
Modify the schedule to run this job automatically.
$ oc annotate pvc data-pvc "reclaimspace.csiaddons.openshift.io/schedule=@weekly" --overwrite=true
persistentvolumeclaim/data-pvc annotated
Verify that the schedule for
reclaimspacecronjob
has been modified.$ oc get reclaimspacecronjobs.csiaddons.openshift.io
NAME SCHEDULE SUSPEND ACTIVE LASTSCHEDULE AGE data-pvc-1642664617 @weekly 3s
12.2. Disabling reclaim space for a specific PersistentVolumeClaim
To disable reclaim space for a specific PersistentVolumeClaim (PVC), modify the associated ReclaimSpaceCronJob
custom resource (CR).
Identify the
ReclaimSpaceCronJob
CR
for the PVC you want to disable reclaim space on:$ oc get reclaimspacecronjobs -o jsonpath='{range .items[?(@.spec.jobTemplate.spec.target.persistentVolumeClaim=="<PVC_NAME>")]}{.metadata.name}{"\n"}{end}'
Replace
"<PVC_NAME>"
with the name of the PVC.Apply the following to the
ReclaimSpaceCronJob
CR from step 1 to disable the reclaim space:Update the
csiaddons.openshift.io/state
annotation from"managed"
to"unmanaged"
$ oc annotate reclaimspacecronjobs <RECLAIMSPACECRONJOB_NAME> "csiaddons.openshift.io/state=unmanaged" --overwrite=true
Replace
<RECLAIMSPACECRONJOB_NAME>
with theReclaimSpaceCronJob
CR.Add
suspend: true
under thespec
field:$ oc patch reclaimspacecronjobs <RECLAIMSPACECRONJOB_NAME> -p '{"spec": {"suspend": true}}' --type=merge
12.3. Enabling reclaim space operation using ReclaimSpaceJob
ReclaimSpaceJob
is a namespaced custom resource (CR) designed to invoke reclaim space operation on the target volume. This is a one time method that immediately starts the reclaim space operation. You have to repeat the creation of ReclaimSpaceJob
CR to repeat the reclaim space operation when required.
-
Recommended interval between the reclaim space operations is
weekly
. -
Ensure that the minimum interval between each operation is at least
24 hours
. - Schedule the reclaim space operation during off-peak, maintenance window, or when the workload input/output is expected to be low.
Procedure
Create and apply the following custom resource for reclaim space operation:
apiVersion: csiaddons.openshift.io/v1alpha1 kind: ReclaimSpaceJob metadata: name: sample-1 spec: target: persistentVolumeClaim: pvc-1 timeout: 360
where,
target
- Indicates the volume target on which the operation is performed.
persistentVolumeClaim
-
Name of the
PersistentVolumeClaim
. backOfflimit
-
Specifies the maximum number of retries before marking the reclaim space operation as
failed
. The default value is6
. The allowed maximum and minimum values are60
and0
respectively. retryDeadlineSeconds
-
Specifies the duration in which the operation might retire in seconds and it is relative to the start time. The value must be a positive integer. The default value is
600
seconds and the allowed maximum value is1800
seconds. timeout
-
Specifies the timeout in seconds for the
grpc
request sent to the CSI driver. If the timeout value is not specified, it defaults to the value of global reclaimspace timeout. Minimum allowed value for timeout is 60.
- Delete the custom resource after completion of the operation.
12.4. Enabling reclaim space operation using ReclaimSpaceCronJob
ReclaimSpaceCronJob
invokes the reclaim space operation based on the given schedule such as daily, weekly, and so on. You have to create ReclaimSpaceCronJob
only once for a persistent volume claim. The CSI-addons
controller creates a ReclaimSpaceJob
at the requested time and interval with the schedule attribute.
-
Recommended schedule interval is
@weekly
. -
Minimum interval between each scheduled operation should be at least 24 hours. For example,
@daily
(At 00:00 every day) or “0 3 * * *” (At 3:00 every day). - Schedule the ReclaimSpace operation during off-peak, maintenance window, or the interval when workload input/output is expected to be low.
Procedure
Create and apply the following custom resource for reclaim space operation
apiVersion: csiaddons.openshift.io/v1alpha1 kind: ReclaimSpaceCronJob metadata: name: reclaimspacecronjob-sample spec: jobTemplate: spec: target: persistentVolumeClaim: data-pvc timeout: 360 schedule: '@weekly' concurrencyPolicy: Forbid
where,
concurrencyPolicy
-
Describes the changes when a new
ReclaimSpaceJob
is scheduled by theReclaimSpaceCronJob
, while a previousReclaimSpaceJob
is still running. The defaultForbid
prevents starting a new job whereasReplace
can be used to delete the running job potentially in a failure state and create a new one. failedJobsHistoryLimit
-
Specifies the number of failed
ReclaimSpaceJobs
that are kept for troubleshooting. jobTemplate
-
Specifies the
ReclaimSpaceJob.spec
structure that describes the details of the requestedReclaimSpaceJob
operation. successfulJobsHistoryLimit
-
Specifies the number of successful
ReclaimSpaceJob
operations. schedule
- Specifieds the and/or interval of the recurring operation request and it is in the same format as the Kubernetes CronJobs.
-
Delete the
ReclaimSpaceCronJob
custom resource when execution of reclaim space operation is no longer needed or when the target PVC is deleted.
12.5. Customising timeouts required for Reclaim Space Operation
Depending on the RBD volume size and its data pattern, Reclaim Space Operation might fail with the context deadline exceeded
error. You can avoid this by increasing the timeout value.
The following example shows the failed status by inspecting -o yaml
of the corresponding ReclaimSpaceJob
:
Example
Status: Completion Time: 2023-03-08T18:56:18Z Conditions: Last Transition Time: 2023-03-08T18:56:18Z Message: Failed to make controller request: context deadline exceeded Observed Generation: 1 Reason: failed Status: True Type: Failed Message: Maximum retry limit reached Result: Failed Retries: 6 Start Time: 2023-03-08T18:33:55Z
You can also set custom timeouts at global level by creating the following configmap
:
Example
apiVersion: v1 kind: ConfigMap metadata: name: csi-addons-config namespace: openshift-storage data: "reclaim-space-timeout": "6m"
Restart the csi-addons
operator pod.
oc delete po -n openshift-storage -l "app.kubernetes.io/name=csi-addons"
All Reclaim Space Operations started after the above configmap
creation use the customized timeout.
' :leveloffset: +1