Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 4. Understanding persistent storage
Managing storage is a distinct problem from managing compute resources. MicroShift uses the Kubernetes persistent volume (PV) framework to allow cluster administrators to provision persistent storage for a cluster. Developers can use persistent volume claims (PVCs) to request PV resources without having specific knowledge of the underlying storage infrastructure.
4.1. Control permissions with security context constraints
You can use security context constraints (SCCs) to control permissions for the pods in your cluster. These permissions determine the actions that a pod can perform and what resources it can access. You can use SCCs to define a set of conditions that a pod must run with to be accepted into the system.
For more information see Managing security context constraints.
Only RWO volume mounts are supported. SCC could be blocked if pods are not operating with the SCC contexts.
4.2. Persistent storage overview
PVCs are specific to a namespace, and are created and used by developers as a means to use a PV. PV resources on their own are not scoped to any single namespace; they can be shared across the entire MicroShift cluster and claimed from any namespace. After a PV is bound to a PVC, that PV can not then be bound to additional PVCs. This has the effect of scoping a bound PV to a single namespace.
PVs are defined by a PersistentVolume
API object, which represents a piece of existing storage in the cluster that was either statically provisioned by the cluster administrator or dynamically provisioned using a StorageClass
object. It is a resource in the cluster just like a node is a cluster resource.
PVs are volume plugins like Volumes
but have a lifecycle that is independent of any individual pod that uses the PV. PV objects capture the details of the implementation of the storage, be that LVM, the host filesystem such as hostpath, or raw block devices.
High availability of storage in the infrastructure is left to the underlying storage provider.
Like PersistentVolumes
, PersistentVolumeClaims
(PVCs) are API objects, which represents a request for storage by a developer. It is similar to a pod in that pods consume node resources and PVCs consume PV resources. For example, pods can request specific levels of resources, such as CPU and memory, while PVCs can request specific storage capacity and access modes. Access modes supported by OpenShift Container Platform are also definable in MicroShift. However, because MicroShift does not support multi-node deployments, only ReadWriteOnce (RWO) is pertinent.
4.3. Additional resources
4.4. Lifecycle of a volume and claim
PVs are resources in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs have the following lifecycle.
4.4.1. Provision storage
In response to requests from a developer defined in a PVC, a cluster administrator configures one or more dynamic provisioners that provision storage and a matching PV.
4.4.2. Bind claims
When you create a PVC, you request a specific amount of storage, specify the required access mode, and create a storage class to describe and classify the storage. The control loop in the master watches for new PVCs and binds the new PVC to an appropriate PV. If an appropriate PV does not exist, a provisioner for the storage class creates one.
The size of all PVs might exceed your PVC size. This is especially true with manually provisioned PVs. To minimize the excess, MicroShift binds to the smallest PV that matches all other criteria.
Claims remain unbound indefinitely if a matching volume does not exist or can not be created with any available provisioner servicing a storage class. Claims are bound as matching volumes become available. For example, a cluster with many manually provisioned 50Gi volumes would not match a PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to the cluster.
4.4.3. Use pods and claimed PVs
Pods use claims as volumes. The cluster inspects the claim to find the bound volume and mounts that volume for a pod. For those volumes that support multiple access modes, you must specify which mode applies when you use the claim as a volume in a pod.
Once you have a claim and that claim is bound, the bound PV belongs to you for as long as you need it. You can schedule pods and access claimed PVs by including persistentVolumeClaim
in the pod’s volumes block.
If you attach persistent volumes that have high file counts to pods, those pods can fail or can take a long time to start. For more information, see When using Persistent Volumes with high file counts in OpenShift, why do pods fail to start or take an excessive amount of time to achieve "Ready" state?.
4.4.4. Release a persistent volume
When you are finished with a volume, you can delete the PVC object from the API, which allows reclamation of the resource. The volume is considered released when the claim is deleted, but it is not yet available for another claim. The previous claimant’s data remains on the volume and must be handled according to policy.
4.4.5. Reclaim policy for persistent volumes
The reclaim policy of a persistent volume tells the cluster what to do with the volume after it is released. A volume’s reclaim policy can be Retain
, Recycle
, or Delete
.
-
Retain
reclaim policy allows manual reclamation of the resource for those volume plugins that support it. -
Recycle
reclaim policy recycles the volume back into the pool of unbound persistent volumes once it is released from its claim.
The Recycle
reclaim policy is deprecated in MicroShift 4. Dynamic provisioning is recommended for equivalent and better functionality.
-
Delete
reclaim policy deletes both thePersistentVolume
object from MicroShift and the associated storage asset in external infrastructure, such as Amazon Elastic Block Store (Amazon EBS) or VMware vSphere.
Dynamically provisioned volumes are always deleted.
4.4.6. Reclaiming a persistent volume manually
When a persistent volume claim (PVC) is deleted, the underlying logical volume is handled according to the reclaimPolicy
.
Procedure
To manually reclaim the PV as a cluster administrator:
Delete the PV.
$ oc delete pv <pv-name>
The associated storage asset in the external infrastructure, such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume, still exists after the PV is deleted.
- Clean up the data on the associated storage asset.
- Delete the associated storage asset. Alternately, to reuse the same storage asset, create a new PV with the storage asset definition.
The reclaimed PV is now available for use by another PVC.
4.4.7. Changing the reclaim policy of a persistent volume
To change the reclaim policy of a persistent volume:
List the persistent volumes in your cluster:
$ oc get pv
Example output
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-b6efd8da-b7b5-11e6-9d58-0ed433a7dd94 4Gi RWO Delete Bound default/claim1 manual 10s pvc-b95650f8-b7b5-11e6-9d58-0ed433a7dd94 4Gi RWO Delete Bound default/claim2 manual 6s pvc-bb3ca71d-b7b5-11e6-9d58-0ed433a7dd94 4Gi RWO Delete Bound default/claim3 manual 3s
Choose one of your persistent volumes and change its reclaim policy:
$ oc patch pv <your-pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
Verify that your chosen persistent volume has the right policy:
$ oc get pv
Example output
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-b6efd8da-b7b5-11e6-9d58-0ed433a7dd94 4Gi RWO Delete Bound default/claim1 manual 10s pvc-b95650f8-b7b5-11e6-9d58-0ed433a7dd94 4Gi RWO Delete Bound default/claim2 manual 6s pvc-bb3ca71d-b7b5-11e6-9d58-0ed433a7dd94 4Gi RWO Retain Bound default/claim3 manual 3s
In the preceding output, the volume bound to claim
default/claim3
now has aRetain
reclaim policy. The volume will not be automatically deleted when a user deletes claimdefault/claim3
.
4.5. Persistent volumes
Each PV contains a spec
and status
, which is the specification and status of the volume, for example:
PersistentVolume
object definition example
apiVersion: v1 kind: PersistentVolume metadata: name: pv0001 1 spec: capacity: storage: 5Gi 2 accessModes: - ReadWriteOnce 3 persistentVolumeReclaimPolicy: Retain 4 ... status: ...
4.5.1. Capacity
Generally, a persistent volume (PV) has a specific storage capacity. This is set by using the capacity
attribute of the PV.
Currently, storage capacity is the only resource that can be set or requested. Future attributes may include IOPS, throughput, and so on.
4.5.2. Supported access modes
LVMS is the only CSI plugin MicroShift supports. The hostPath and LVs built in to OpenShift Container Platform also support RWO.
4.5.3. Phase
Volumes can be found in one of the following phases:
Phase | Description |
---|---|
Available | A free resource not yet bound to a claim. |
Bound | The volume is bound to a claim. |
Released | The claim was deleted, but the resource is not yet reclaimed by the cluster. |
Failed | The volume has failed its automatic reclamation. |
You can view the name of the PVC that is bound to the PV by running the following command:
$ oc get pv <pv-claim>
4.5.3.1. Mount options
You can specify mount options while mounting a PV by using the attribute mountOptions
.
For example:
Mount options example
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" name: topolvm-provisioner mountOptions: - uid=1500 - gid=1500 parameters: csi.storage.k8s.io/fstype: xfs provisioner: topolvm.io reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer allowVolumeExpansion: true
mountOptions
are not validated. Incorrect values will cause the mount to fail and an event to be logged to the PVC.
4.6. Persistent volumes with RWO access mode permissions
Persistent volume claims (PVCs) can be created with different access modes. A PVC with the ReadWriteOnce
(RWO) access mode set allows multiple pods on the same node to read or write into the same PV at once.
There are instances when the pods of the same node are not able to read or write into the same PV. This happens when the pods in the node do not have the same SELinux context. Persistent volumes can be mounted, then later claimed by PVCs, with the RWO access mode.
4.7. Checking the pods for mismatch
Check if the pods have a mismatch by using the following procedure.
-
Replace
<pod_name_a>
with the name of the first pod in the following procedure. -
Replace
<pod_name_b>
with the name of the second pod in the following procedure. -
Replace
<pvc_mountpoint>
with the mount point within the pods.
Procedure
List the mount point within the first pod by running the following command:
$ oc get pods -n <pod_name_a> -ojsonpath='{.spec.containers[].volumeMounts[].mountPath}' 1
- 1
- Replace
<pod_name_a>
with the name of the first pod.
Example output
/files /var/run/secrets/kubernetes.io/serviceaccount
List the mount point within the second pod by running the following command:
$ oc get pods -n <pod_name_b> -ojsonpath='{.spec.containers[].volumeMounts[].mountPath}' 1
- 1
- Replace
<pod_name_b>
with the name of the second pod.
Example output
/files /var/run/secrets/kubernetes.io/serviceaccount
Check the context and permissions inside the first pod by running the following command:
$ oc rsh <pod_name_a> ls -lZah <pvc_mountpoint> 1
- 1
- Replace
<pod_name_a>
with the name of the first pod and replace<pvc_mountpoint>
with the mount point within the first pod.
Example output
total 12K dr-xr-xr-x. 1 root root system_u:object_r:container_file_t:s0:c398,c806 40 Feb 17 13:36 . dr-xr-xr-x. 1 root root system_u:object_r:container_file_t:s0:c398,c806 40 Feb 17 13:36 .. [...]
Check the context and permissions inside the second pod by running the following command:
$ oc rsh <pod_name_b> ls -lZah <pvc_mountpoint> 1
- 1
- Replace
<pod_name_b>
with the name of the second pod and replace<pvc_mountpoint>
with the mount point within the second pod.
Example output
total 12K dr-xr-xr-x. 1 root root system_u:object_r:container_file_t:s0:c15,c25 40 Feb 17 13:34 . dr-xr-xr-x. 1 root root system_u:object_r:container_file_t:s0:c15,c25 40 Feb 17 13:34 .. [...]
- Compare both the outputs to check if there is a mismatch of SELinux context.
4.8. Updating the pods which have mismatch
Update the SELinux context of the pods if a mismatch is found by using the following procedure.
Procedure
- When there is a mismatch of the SELinux content, create a new security context constraint (SCC) and assign it to both pods. To create a SCC, see Creating security context constraints.
Update the SELinux context as shown in the following example:
Example output
[...] securityContext:privileged seLinuxOptions:MustRunAs level: "s0:cXX,cYY" [...]
4.9. Verifying pods after resolving a mismatch
Verify the security context constraint (SCC) and the SELinux label of both the pods by using the following verification steps.
Verification
Verify that the same SCC is assigned to the first pod by running the following command:
$ oc describe pod <pod_name_a> |grep -i scc 1
- 1
- Replace
<pod_name_a>
with the name of the first pod.
Example output
openshift.io/scc: restricted
Verify that the same SCC is assigned to first second pod by running the following command:
$ oc describe pod <pod_name_b> |grep -i scc 1
- 1
- Replace
<pod_name_b>
with the name of the second pod.
Example output
openshift.io/scc: restricted
Verify that the same SELinux label is applied to first pod by running the following command:
$ oc exec <pod_name_a> -- ls -laZ <pvc_mountpoint> 1
- 1
- Replace
<pod_name_a>
with the name of the first pod and replace<pvc_mountpoint>
with the mount point within the first pod.
Example output
total 4 drwxrwsrwx. 2 root 1000670000 system_u:object_r:container_file_t:s0:c10,c26 19 Aug 29 18:17 . dr-xr-xr-x. 1 root root system_u:object_r:container_file_t:s0:c10,c26 61 Aug 29 18:16 .. -rw-rw-rw-. 1 1000670000 1000670000 system_u:object_r:container_file_t:s0:c10,c26 29 Aug 29 18:17 test1 [...]
Verify that the same SELinux label is applied to second pod by running the following command:
$ oc exec <pod_name_b> -- ls -laZ <pvc_mountpoint> 1
- 1
- Replace
<pod_name_b>
with the name of the second pod and replace<pvc_mountpoint>
with the mount point within the second pod.
Example output
total 4 drwxrwsrwx. 2 root 1000670000 system_u:object_r:container_file_t:s0:c10,c26 19 Aug 29 18:17 . dr-xr-xr-x. 1 root root system_u:object_r:container_file_t:s0:c10,c26 61 Aug 29 18:16 .. -rw-rw-rw-. 1 1000670000 1000670000 system_u:object_r:container_file_t:s0:c10,c26 29 Aug 29 18:17 test1 [...]
Additional resources
4.10. Persistent volume claims
Each PersistentVolumeClaim
object contains a spec
and status
, which is the specification and status of the persistent volume claim (PVC), for example:
PersistentVolumeClaim
object definition example
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: myclaim 1 spec: accessModes: - ReadWriteOnce 2 resources: requests: storage: 8Gi 3 storageClassName: gold 4 status: ...
4.10.1. Storage classes
Claims can optionally request a specific storage class by specifying the storage class’s name in the storageClassName
attribute. Only PVs of the requested class, ones with the same storageClassName
as the PVC, can be bound to the PVC. The cluster administrator can configure dynamic provisioners to service one or more storage classes. The cluster administrator can create a PV on demand that matches the specifications in the PVC.
The cluster administrator can also set a default storage class for all PVCs. When a default storage class is configured, the PVC must explicitly ask for StorageClass
or storageClassName
annotations set to ""
to be bound to a PV without a storage class.
If more than one storage class is marked as default, a PVC can only be created if the storageClassName
is explicitly specified. Therefore, only one storage class should be set as the default.
4.10.2. Access modes
Claims use the same conventions as volumes when requesting storage with specific access modes.
4.10.3. Resources
Claims, such as pods, can request specific quantities of a resource. In this case, the request is for storage. The same resource model applies to volumes and claims.
4.10.4. Claims as volumes
Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the pod using the claim. The cluster finds the claim in the pod’s namespace and uses it to get the PersistentVolume
backing the claim. The volume is mounted to the host and into the pod, for example:
Mount volume to the host and into the pod example
kind: Pod apiVersion: v1 metadata: name: mypod spec: containers: - name: myfrontend image: dockerfile/nginx volumeMounts: - mountPath: "/var/www/html" 1 name: mypd 2 volumes: - name: mypd persistentVolumeClaim: claimName: myclaim 3
- 1
- Path to mount the volume inside the pod.
- 2
- Name of the volume to mount. Do not mount to the container root,
/
, or any path that is the same in the host and the container. This can corrupt your host system if the container is sufficiently privileged, such as the host/dev/pts
files. It is safe to mount the host by using/host
. - 3
- Name of the PVC, that exists in the same namespace, to use.
4.11. Using fsGroup to reduce pod timeouts
If a storage volume contains many files (~1,000,000 or greater), you may experience pod timeouts.
This can occur because, by default, MicroShift recursively changes ownership and permissions for the contents of each volume to match the fsGroup
specified in a pod’s securityContext
when that volume is mounted. For large volumes, checking and changing ownership and permissions can be time consuming, slowing pod startup. You can use the fsGroupChangePolicy
field inside a securityContext
to control the way that MicroShift checks and manages ownership and permissions for a volume.
fsGroupChangePolicy
defines behavior for changing ownership and permission of the volume before being exposed inside a pod. This field only applies to volume types that support fsGroup
-controlled ownership and permissions. This field has two possible values:
-
OnRootMismatch
: Only change permissions and ownership if permission and ownership of root directory does not match with expected permissions of the volume. This can help shorten the time it takes to change ownership and permission of a volume to reduce pod timeouts. -
Always
: Always change permission and ownership of the volume when a volume is mounted.
fsGroupChangePolicy
example
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
fsGroupChangePolicy: "OnRootMismatch" 1
...
- 1
OnRootMismatch
specifies skipping recursive permission change, thus helping to avoid pod timeout problems.
The fsGroupChangePolicyfield has no effect on ephemeral volume types, such as secret, configMap, and emptydir.