This documentation is for a release that is no longer maintained
See documentation for the latest supported version 3 or the latest supported version 4.このコンテンツは選択した言語では利用できません。
Chapter 15. Configuring Persistent Storage
15.1. Overview
The Kubernetes persistent volume framework allows you to provision an OpenShift cluster with persistent storage using networked storage available in your environment. This can be done after completing the initial OpenShift installation depending on your application needs, giving users a way to request those resources without having any knowledge of the underlying infrastructure.
These topics show how to configure persistent volumes in OpenShift using the following supported volume plug-ins:
15.2. Persistent Storage Using NFS
15.2.1. Overview
OpenShift clusters can be provisioned with persistent storage using NFS. Persistent volumes (PVs) and persistent volume claims (PVCs) provide a convenient method for sharing a volume across a project. While the NFS-specific information contained in a PV definition could also be defined directly in a pod definition, doing so does not create the volume as a distinct cluster resource, making the volume more susceptible to conflicts.
This topic covers the specifics of using the NFS persistent storage type. Some familiarity with OpenShift and NFS is beneficial. See the Persistent Storage concept topic for details on the OpenShift persistent volume (PV) framework in general.
15.2.2. Provisioning
Storage must exist in the underlying infrastructure before it can be mounted as a volume in OpenShift. To provision NFS volumes, a list of NFS servers and export paths are all that is required.
You must first create an object definition for the PV:
Example 15.1. PV Object Definition Using NFS
- 1
- The name of the volume. This is the PV identity in variousoc <command> podcommands.
- 2
- The amount of storage allocated to this volume.
- 3
- Though this appears to be related to controlling access to the volume, it is actually used similarly to labels and used to match a PVC to a PV. Currently, no access rules are enforced based on theaccessModes.
- 4
- The volume type being used, in this case the nfs plug-in.
- 5
- The path that is exported by the NFS server.
- 6
- The host name or IP address of the NFS server.
- 7
- The reclaim policy for the PV. This defines what happens to a volume when released from its claim. Valid options are Retain (default) and Recycle. See Reclaiming Resources.
Each NFS volume must be mountable by all schedulable nodes in the cluster.
Save the definition to a file, for example nfs-pv.yaml, and create the PV:
oc create -f nfs-pv.yaml
$ oc create -f nfs-pv.yaml
persistentvolume "pv0001" createdVerify that the PV was created:
oc get pv
# oc get pv
NAME                     LABELS    CAPACITY     ACCESSMODES   STATUS      CLAIM     REASON    AGE
pv0001                   <none>    5368709120   RWO           Available                       31sThe next step can be to create a PVC, which binds to the new PV:
Example 15.2. PVC Object Definition
Save the definition to a file, for example nfs-claim.yaml, and create the PVC:
oc create -f nfs-claim.yaml
# oc create -f nfs-claim.yaml15.2.3. Enforcing Disk Quotas
You can use disk partitions to enforce disk quotas and size constraints. Each partition can be its own export. Each export is one PV. OpenShift enforces unique names for PVs, but the uniqueness of the NFS volume’s server and path is up to the administrator.
Enforcing quotas in this way allows the developer to request persistent storage by a specific amount (for example, 10Gi) and be matched with a corresponding volume of equal or greater capacity.
15.2.4. NFS Volume Security
This section covers NFS volume security, including matching permissions and SELinux considerations. The user is expected to understand the basics of POSIX permissions, process UIDs, supplemental groups, and SELinux.
See the full Volume Security topic before implementing NFS volumes.
					Developers request NFS storage by referencing, in the volumes section of their pod definition, either a PVC by name or the NFS volume plug-in directly.
				
The /etc/exports file on the NFS server contains the accessible NFS directories. The target NFS directory has POSIX owner and group IDs. The OpenShift NFS plug-in mounts the container’s NFS directory with the same POSIX ownership and permissions found on the exported NFS directory. However, the container is not run with its effective UID equal to the owner of the NFS mount, which is the desired behavior.
As an example, if the target NFS directory appears on the NFS server as:
ls -lZ /opt/nfs -d id nfsnobody
# ls -lZ /opt/nfs -d
drwxrws---. nfsnobody 5555 unconfined_u:object_r:usr_t:s0   /opt/nfs
# id nfsnobody
uid=65534(nfsnobody) gid=65534(nfsnobody) groups=65534(nfsnobody)Then the container must match SELinux labels, and either run with a UID of 65534 (nfsnobody owner) or with 5555 in its supplemental groups in order to access the directory.
The owner ID of 65534 is used as an example. Even though NFS’s root_squash maps root (0) to nfsnobody (65534), NFS exports can have arbitrary owner IDs. Owner 65534 is not required for NFS exports.
15.2.4.1. Group IDs
						The recommended way to handle NFS access (assuming it is not an option to change permissions on the NFS export) is to use supplemental groups. Supplemental groups in OpenShift are used for shared storage, of which NFS is an example. In contrast, block storage, such as Ceph RBD or iSCSI, use the fsGroup SCC strategy and the fsGroup value in the pod’s securityContext.
					
It is generally preferable to use supplemental group IDs to gain access to persistent storage versus using user IDs. Supplemental groups are covered further in the full Volume Security topic.
						Because the group ID on the example target NFS directory shown above is 5555, the pod can define that group ID using supplementalGroups under the pod-level securityContext definition. For example:
					
						Assuming there are no custom SCCs that might satisfy the pod’s requirements, the pod likely matches the restricted SCC. This SCC has the supplementalGroups strategy set to RunAsAny, meaning that any supplied group ID is accepted without range checking.
					
As a result, the above pod passes admissions and is launched. However, if group ID range checking is desired, a custom SCC, as described in pod security and custom SCCs, is the preferred solution. A custom SCC can be created such that minimum and maximum group IDs are defined, group ID range checking is enforced, and a group ID of 5555 is allowed.
15.2.4.2. User IDs
User IDs can be defined in the container image or in the pod definition. The full Volume Security topic covers controlling storage access based on user IDs, and should be read prior to setting up NFS persistent storage.
It is generally preferable to use supplemental group IDs to gain access to persistent storage versus using user IDs.
In the example target NFS directory shown above, the container needs its UID set to 65534 (ignoring group IDs for the moment), so the following can be added to the pod definition:
Assuming the default project and the restricted SCC, the pod’s requested user ID of 65534 is not allowed, and therefore the pod fails. The pod fails for the following reasons:
- It requests 65534 as its user ID.
- All SCCs available to the pod are examined to see which SCC allows a user ID of 65534 (actually, all policies of the SCCs are checked but the focus here is on user ID).
- 
								Because all available SCCs use MustRunAsRange for their runAsUserstrategy, UID range checking is required.
- 65534 is not included in the SCC or project’s user ID range.
It is generally considered a good practice not to modify the predefined SCCs. The preferred way to fix this situation is to create a custom SCC, as described in the full Volume Security topic. A custom SCC can be created such that minimum and maximum user IDs are defined, UID range checking is still enforced, and the UID of 65534 is allowed.
15.2.4.3. SELinux
See the full Volume Security topic for information on controlling storage access in conjunction with using SELinux.
By default, SELinux does not allow writing from a pod to a remote NFS server. The NFS volume mounts correctly, but is read-only.
To enable writing to NFS volumes with SELinux enforcing on each node, run:
setsebool -P virt_use_nfs 1 setsebool -P virt_sandbox_use_nfs 1
# setsebool -P virt_use_nfs 1
# setsebool -P virt_sandbox_use_nfs 1
						The -P option above makes the bool persistent between reboots.
					
The virt_use_nfs boolean is defined by the docker-selinux package. If an error is seen indicating that this bool is not defined, ensure this package has been installed.
15.2.4.4. Export Settings
In order to enable arbitrary container users to read and write the volume, each exported volume on the NFS server should conform to the following conditions:
- Each export must be: - /<example_fs> *(rw,root_squash,no_wdelay) - /<example_fs> *(rw,root_squash,no_wdelay)- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The - no_wdelayoption prevents the server from delaying writes, which greatly improves read-after-write consistency.
- The firewall must be configured to allow traffic to the mount point. For NFSv4, the default port is 2049 (nfs). For NFSv3, there are three ports to configure: 2049 (nfs), 20048 (mountd), and 111 (portmapper). - NFSv4 - iptables -I INPUT 1 -p tcp --dport 2049 -j ACCEPT - # iptables -I INPUT 1 -p tcp --dport 2049 -j ACCEPT- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - NFSv3 - iptables -I INPUT 1 -p tcp --dport 2049 -j ACCEPT iptables -I INPUT 1 -p tcp --dport 20048 -j ACCEPT iptables -I INPUT 1 -p tcp --dport 111 -j ACCEPT - # iptables -I INPUT 1 -p tcp --dport 2049 -j ACCEPT # iptables -I INPUT 1 -p tcp --dport 20048 -j ACCEPT # iptables -I INPUT 1 -p tcp --dport 111 -j ACCEPT- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- 
								The NFS export and directory must be set up so that it is accessible by the target pods. Either set the export to be owned by the container’s primary UID, or supply the pod group access using supplementalGroups, as shown in Group IDs above. See the full Volume Security topic for additional pod security information as well.
15.2.5. Reclaiming Resources
NFS implements the OpenShift Recyclable plug-in interface. Automatic processes handle reclamation tasks based on policies set on each persistent volume.
					By default, PVs are set to Retain. NFS volumes which are set to Recycle are scrubbed (i.e., rm -rf is run on the volume) after being released from their claim (i.e, after the user’s PersistentVolumeClaim bound to the volume is deleted). Once recycled, the NFS volume can be bound to a new claim.
				
Once claim to a PV is released (that is, the PVC is deleted), the PV object should not be re-used. Instead, a new PV should be created with the same basic volume details as the original.
					For example, the administrator creates a PV named nfs1:
				
					The user creates PVC1, which binds to nfs1. The user then deletes PVC1, releasing claim to nfs1, which causes nfs1 to be Released. If the administrator wishes to make the same NFS share available, they should create a new PV with the same NFS server details, but a different PV name:
				
					Deleting the original PV and re-creating it with the same name is discouraged. Attempting to manually change the status of a PV from Released to Available causes errors and potential data loss.
				
						A PV with retention policy of Recycle scrubs (rm -rf) the data and marks it as Available for claim. The Recycle retention policy is deprecated starting in OpenShift Enterprise 3.6 and should be avoided. Anyone using recycler should use dynamic provision and volume deletion instead.
					
15.2.6. Automation
Clusters can be provisioned with persistent storage using NFS in the following ways:
- Enforce storage quotas using disk partitions.
- Enforce security by restricting volumes to the project that has a claim to them.
- Configure reclamation of discarded resources for each PV.
They are many ways that you can use scripts to automate the above tasks. You can use an example Ansible playbook to help you get started.
15.2.7. Additional Configuration and Troubleshooting
Depending on what version of NFS is being used and how it is configured, there may be additional configuration steps needed for proper export and security mapping. The following are some that may apply:
| NFSv4 mount incorrectly shows all files with ownership of nobody:nobody | 
 | 
| Disabling ID mapping on NFSv4 | 
 | 
15.3. Persistent Storage Using GlusterFS
15.3.1. Overview
OpenShift Enterprise clusters can be provisioned with persistent storage using GlusterFS.
Persistent volumes (PVs) and persistent volume claims (PVCs) can share volumes across a single project. While the GlusterFS-specific information contained in a PV definition could also be defined directly in a pod definition, doing so does not create the volume as a distinct cluster resource, making the volume more susceptible to conflicts.
This topic presumes some familiarity with OpenShift Enterprise and GlusterFS. See the Persistent Storage topic for details on the OpenShift Enterprise PV framework in general.
High-availability of storage in the infrastructure is left to the underlying storage provider.
15.3.2. Provisioning
To provision GlusterFS volumes the following are required:
- An existing storage device in your underlying infrastructure.
- A distinct list of servers (IP addresses) in the Gluster cluster, to be defined as endpoints.
- A service, to persist the endpoints (optional).
- An existing Gluster volume to be referenced in the persistent volume object.
- glusterfs-fuse installed on each schedulable OpenShift Enterprise node in your cluster: - yum install glusterfs-fuse - # yum install glusterfs-fuse- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
OpenShift Enterprise nodes can also host a gluster node (referred to as hyperconverged storage). However, performance may be less predictable and harder to manage.
15.3.2.1. Creating Gluster Endpoints
						An endpoints definition defines the GlusterFS cluster as EndPoints and includes the IP addresses of your Gluster servers. The port value can be any numeric value within the accepted range of ports. Optionally, you can create a service that persists the endpoints.
					
- Define the following service: - Example 15.3. Gluster Service Definition - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- This name must be defined in the endpoints definition to match the endpoints to this service
 
- Save the service definition to a file, for example gluster-service.yaml, then create the service: - oc create -f gluster-service.yaml - $ oc create -f gluster-service.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Verify that the service was created: - oc get services - # oc get services NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE glusterfs-cluster 172.30.205.34 <none> 1/TCP <none> 44s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Define the Gluster endpoints: 
- Save the endpoints definition to a file, for example gluster-endpoints.yaml, then create the endpoints: - oc create -f gluster-endpoints.yaml - $ oc create -f gluster-endpoints.yaml endpoints "glusterfs-cluster" created- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Verify that the endpoints were created: - oc get endpoints - $ oc get endpoints NAME ENDPOINTS AGE docker-registry 10.1.0.3:5000 4h glusterfs-cluster 192.168.122.221:1,192.168.122.222:1 11s kubernetes 172.16.35.3:8443 4d- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
15.3.2.2. Creating the Persistent Volume
- Next, define the PV in an object definition before creating it in OpenShift Enterprise: - Example 15.5. Persistent Volume Object Definition Using GlusterFS - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- The name of the volume. This is how it is identified via persistent volume claims or from pods.
- 2
- The amount of storage allocated to this volume.
- 3
- accessModesare used as labels to match a PV and a PVC. They currently do not define any form of access control.
- 4
- The volume type being used, in this case the glusterfs plug-in.
- 5
- The endpoints name that defines the Gluster cluster created in Creating Gluster Endpoints.
- 6
- The Gluster volume that will be accessed, as shown in thegluster volume statuscommand.
 
- Save the definition to a file, for example gluster-pv.yaml, and create the persistent volume: - oc create -f gluster-pv.yaml - # oc create -f gluster-pv.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Verify that the persistent volume was created: - oc get pv - # oc get pv NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON AGE gluster-default-volume <none> 2147483648 RWX Available 2s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
15.3.2.3. Creating the Persistent Volume Claim
						Developers request GlusterFS storage by referencing either a PVC or the Gluster volume plug-in directly in the volumes section of a pod spec. A PVC exists only in the user’s project and can only be referenced by pods within that project. Any attempt to access a PV across a project causes the pod to fail.
					
- Create a PVC that will bind to the new PV: 
- Save the definition to a file, for example gluster-claim.yaml, and create the PVC: - oc create -f gluster-claim.yaml - # oc create -f gluster-claim.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- PVs and PVCs make sharing a volume across a project simpler. The gluster-specific information contained in the PV definition can also be defined directly in a pod specification. 
15.3.3. Gluster Volume Security
This section covers Gluster volume security, including matching permissions and SELinux considerations. Understanding the basics of POSIX permissions, process UIDs, supplemental groups, and SELinux is presumed.
See the full Volume Security topic before implementing Gluster volumes.
					As an example, assume that the target Gluster volume, HadoopVol is mounted under /mnt/glusterfs/, with the following POSIX permissions and SELinux labels:
				
ls -lZ /mnt/glusterfs/ id yarn
# ls -lZ /mnt/glusterfs/
drwxrwx---. yarn hadoop system_u:object_r:fusefs_t:s0    HadoopVol
# id yarn
uid=592(yarn) gid=590(hadoop) groups=590(hadoop)
					In order to access the HadoopVol volume, containers must match the SELinux label, and run with a UID of 592 or 590 in their supplemental groups. The OpenShift Enterprise GlusterFS plug-in mounts the volume in the container with the same POSIX ownership and permissions found on the target gluster mount, namely the owner will be 592 and group ID will be 590. However, the container is not run with its effective UID equal to 592, nor with its GID equal to 590, which is the desired behavior. Instead, a container’s UID and supplemental groups are determined by Security Context Constraints (SCCs) and the project defaults.
				
15.3.3.1. Group IDs
						Configure Gluster volume access by using supplemental groups, assuming it is not an option to change permissions on the Gluster mount. Supplemental groups in OpenShift Enterprise are used for shared storage, such as GlusterFS. In contrast, block storage, such as Ceph RBD or iSCSI, use the fsGroup SCC strategy and the fsGroup value in the pod’s securityContext.
					
Use supplemental group IDs instead of user IDs to gain access to persistent storage. Supplemental groups are covered further in the full Volume Security topic.
						The group ID on the target Gluster mount example above is 590. Therefore, a pod can define that group ID using supplementalGroups under the pod-level securityContext definition. For example:
					
						Assuming there are no custom SCCs that satisfy the pod’s requirements, the pod matches the restricted SCC. This SCC has the supplementalGroups strategy set to RunAsAny, meaning that any supplied group IDs are accepted without range checking.
					
As a result, the above pod will pass admissions and can be launched. However, if group ID range checking is desired, use a custom SCC, as described in pod security and custom SCCs. A custom SCC can be created to define minimum and maximum group IDs, enforce group ID range checking, and allow a group ID of 590.
15.3.3.2. User IDs
User IDs can be defined in the container image or in the pod definition. The full Volume Security topic covers controlling storage access based on user IDs, and should be read prior to setting up NFS persistent storage.
Use supplemental group IDs instead of user IDs to gain access to persistent storage.
In the target Gluster mount example above, the container needs a UID set to 592, so the following can be added to the pod definition:
With the default project and the restricted SCC, a pod’s requested user ID of 592 will not be allowed, and the pod will fail. This is because:
- The pod requests 592 as its user ID.
- All SCCs available to the pod are examined to see which SCC will allow a user ID of 592.
- 
								Because all available SCCs use MustRunAsRange for their runAsUserstrategy, UID range checking is required.
- 592 is not included in the SCC or project’s user ID range.
Do not modify the predefined SCCs. Insead, create a custom SCC so that minimum and maximum user IDs are defined, UID range checking is still enforced, and the UID of 592 will be allowed.
15.3.3.3. SELinux
See the full Volume Security topic for information on controlling storage access in conjunction with using SELinux.
By default, SELinux does not allow writing from a pod to a remote Gluster server.
To enable writing to GlusterFS volumes with SELinux enforcing on each node, run:
sudo setsebool -P virt_sandbox_use_fusefs on
$ sudo setsebool -P virt_sandbox_use_fusefs on
							The virt_sandbox_use_fusefs boolean is defined by the docker-selinux package. If you get an error saying it is not defined, please ensure that this package is installed.
						
						The -P option makes the bool persistent between reboots.
					
15.4. Persistent Storage Using OpenStack Cinder
15.4.1. Overview
You can provision your OpenShift cluster with persistent storage using OpenStack Cinder. Some familiarity with Kubernetes and OpenStack is assumed.
Before creating persistent volumes using Cinder, OpenShift must first be properly configured for OpenStack.
The Kubernetes persistent volume framework allows administrators to provision a cluster with persistent storage and gives users a way to request those resources without having any knowledge of the underlying infrastructure. Persistent volumes are not bound to a single project or namespace; they can be shared across the OpenShift cluster. Persistent volume claims, however, are specific to a project or namespace and can be requested by users.
For a detailed example, see the guide for WordPress and MySQL using persistent volumes.
High-availability of storage in the infrastructure is left to the underlying storage provider.
15.4.2. Provisioning
					Storage must exist in the underlying infrastructure before it can be mounted as a volume in OpenShift. After ensuring OpenShift is configured for OpenStack, all that is required for Cinder is a Cinder volume ID and the PersistentVolume API.
				
15.4.2.1. Creating the Persistent Volume
You must define your persistent volume in an object definition before creating it in OpenShift:
Example 15.7. Persistent Volume Object Definition Using Cinder
- 1
- The name of the volume. This will be how it is identified via persistent volume claims or from pods.
- 2
- The amount of storage allocated to this volume.
- 3
- This defines the volume type being used, in this case the cinder plug-in.
- 4
- File system type to mount.
- 5
- This is the Cinder volume that will be used.
							Changing the value of the fstype parameter after the volume has been formatted and provisioned can result in data loss and pod failure.
						
Save your definition to a file, for example cinder-pv.yaml, and create the persistent volume:
oc create -f cinder-pv.yaml
# oc create -f cinder-pv.yaml
persistentvolume "pv0001" createdVerify that the persistent volume was created:
oc get pv
# oc get pv
NAME      LABELS    CAPACITY   ACCESSMODES   STATUS      CLAIM     REASON    AGE
pv0001    <none>    5Gi        RWO           Available                       2sUsers can then request storage using persistent volume claims, which can now utilize your new persistent volume.
Persistent volume claims only exist in the user’s namespace and can only be referenced by a pod within that same namespace. Any attempt to access a persistent volume from a different namespace causes the pod to fail.
15.4.2.2. Volume Format
						Before OpenShift mounts the volume and passes it to a container, it checks that it contains a file system as specified by the fsType parameter in the persistent volume definition. If the device is not formatted with the file system, all data from the device is erased and the device is automatically formatted with the given file system.
					
This allows using unformatted Cinder volumes as persistent volumes, because OpenShift Enterprise formats them before the first use.
15.5. Persistent Storage Using Ceph Rados Block Device (RBD)
15.5.1. Overview
OpenShift Enterprise clusters can be provisioned with persistent storage using Ceph RBD.
Persistent volumes (PVs) and persistent volume claims (PVCs) can share volumes across a single project. While the Ceph RBD-specific information contained in a PV definition could also be defined directly in a pod definition, doing so does not create the volume as a distinct cluster resource, making the volume more susceptible to conflicts.
This topic presumes some familiarity with OpenShift Enterprise and Ceph RBD. See the Persistent Storage concept topic for details on the OpenShift Enterprise persistent volume (PV) framework in general.
Project and namespace are used interchangeably throughout this document. See Projects and Users for details on the relationship.
High-availability of storage in the infrastructure is left to the underlying storage provider.
15.5.2. Provisioning
To provision Ceph volumes, the following are required:
- An existing storage device in your underlying infrastructure.
- The Ceph key to be used in an OpenShift Enterprise secret object.
- The Ceph image name.
- The file system type on top of the block storage (e.g., ext4).
- ceph-common installed on each schedulable OpenShift Enterprise node in your cluster: - yum install ceph-common - # yum install ceph-common- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
15.5.2.1. Creating the Ceph Secret
Define the authorization key in a secret configuration, which is then converted to base64 for use by OpenShift Enterprise.
In order to use Ceph storage to back a persistent volume, the secret must be created in the same project as the PVC and pod. The secret cannot simply be in the default project.
- Run - ceph auth get-keyon a Ceph MON node to display the key value for the- client.adminuser:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Save the secret definition to a file, for example ceph-secret.yaml, then create the secret: - oc create -f ceph-secret.yaml - $ oc create -f ceph-secret.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Verify that the secret was created: - oc get secret ceph-secret - # oc get secret ceph-secret NAME TYPE DATA AGE ceph-secret Opaque 1 23d- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
15.5.2.2. Creating the Persistent Volume
						Developers request Ceph RBD storage by referencing either a PVC, or the Gluster volume plug-in directly in the volumes section of a pod specification. A PVC exists only in the user’s namespace and can be referenced only by pods within that same namespace. Any attempt to access a PV from a different namespace causes the pod to fail.
					
- Define the PV in an object definition before creating it in OpenShift Enterprise: - Example 15.8. Persistent Volume Object Definition Using Ceph RBD - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- The name of the PV that is referenced in pod definitions or displayed in variousocvolume commands.
- 2
- The amount of storage allocated to this volume.
- 3
- accessModesare used as labels to match a PV and a PVC. They currently do not define any form of access control. All block storage is defined to be single user (non-shared storage).
- 4
- The volume type being used, in this case the rbd plug-in.
- 5
- An array of Ceph monitor IP addresses and ports.
- 6
- The Ceph secret used to create a secure connection from OpenShift Enterprise to the Ceph server.
- 7
- The file system type mounted on the Ceph RBD block device.
 Important- Changing the value of the - fstypeparameter after the volume has been formatted and provisioned can result in data loss and pod failure.
- Save your definition to a file, for example ceph-pv.yaml, and create the PV: - oc create -f ceph-pv.yaml - # oc create -f ceph-pv.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Verify that the persistent volume was created: - oc get pv - # oc get pv NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON AGE ceph-pv <none> 2147483648 RWO Available 2s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create a PVC that will bind to the new PV: 
- Save the definition to a file, for example ceph-claim.yaml, and create the PVC: - oc create -f ceph-claim.yaml - # oc create -f ceph-claim.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
15.5.3. Ceph Volume Security
See the full Volume Security topic before implementing Ceph RBD volumes.
					A significant difference between shared volumes (NFS and GlusterFS) and block volumes (Ceph RBD, iSCSI, and most cloud storage), is that the user and group IDs defined in the pod definition or docker image are applied to the target physical storage. This is referred to as managing ownership of the block device. For example, if the Ceph RBD mount has its owner set to 123 and its group ID set to 567, and if the pod defines its runAsUser set to 222 and its fsGroup to be 7777, then the Ceph RBD physical mount’s ownership will be changed to 222:7777.
				
Even if the user and group IDs are not defined in the pod specification, the resulting pod may have defaults defined for these IDs based on its matching SCC, or its project. See the full Volume Security topic which covers storage aspects of SCCs and defaults in greater detail.
					A pod defines the group ownership of a Ceph RBD volume using the fsGroup stanza under the pod’s securityContext definition:
				
15.6. Persistent Storage Using AWS Elastic Block Store
15.6.1. Overview
OpenShift supports AWS Elastic Block Store volumes (EBS). You can provision your OpenShift cluster with persistent storage using AWS EC2. Some familiarity with Kubernetes and AWS is assumed.
Before creating persistent volumes using AWS, OpenShift must first be properly configured for AWS ElasticBlockStore.
The Kubernetes persistent volume framework allows administrators to provision a cluster with persistent storage and gives users a way to request those resources without having any knowledge of the underlying infrastructure. Persistent volumes are not bound to a single project or namespace; they can be shared across the OpenShift cluster. Persistent volume claims, however, are specific to a project or namespace and can be requested by users.
High-availability of storage in the infrastructure is left to the underlying storage provider.
15.6.2. Provisioning
					Storage must exist in the underlying infrastructure before it can be mounted as a volume in OpenShift. After ensuring OpenShift is configured for AWS Elastic Block Store, all that is required for OpenShift and AWS is an AWS EBS volume ID and the PersistentVolume API.
				
15.6.2.1. Creating the Persistent Volume
You must define your persistent volume in an object definition before creating it in OpenShift:
Example 15.10. Persistent Volume Object Definition Using AWS
- 1
- The name of the volume. This will be how it is identified via persistent volume claims or from pods.
- 2
- The amount of storage allocated to this volume.
- 3
- This defines the volume type being used, in this case the awsElasticBlockStore plug-in.
- 4
- File system type to mount.
- 5
- This is the AWS volume that will be used.
							Changing the value of the fstype parameter after the volume has been formatted and provisioned can result in data loss and pod failure.
						
Save your definition to a file, for example aws-pv.yaml, and create the persistent volume:
oc create -f aws-pv.yaml
# oc create -f aws-pv.yaml
persistentvolume "pv0001" createdVerify that the persistent volume was created:
oc get pv
# oc get pv
NAME      LABELS    CAPACITY   ACCESSMODES   STATUS      CLAIM     REASON    AGE
pv0001    <none>    5Gi        RWO           Available                       2sUsers can then request storage using persistent volume claims, which can now utilize your new persistent volume.
Persistent volume claims only exist in the user’s namespace and can only be referenced by a pod within that same namespace. Any attempt to access a persistent volume from a different namespace causes the pod to fail.
15.6.2.2. Volume Format
						Before OpenShift mounts the volume and passes it to a container, it checks that it contains a file system as specified by the fsType parameter in the persistent volume definition. If the device is not formatted with the file system, all data from the device is erased and the device is automatically formatted with the given file system.
					
This allows using unformatted AWS volumes as persistent volumes, because OpenShift Enterprise formats them before the first use.
15.7. Persistent Storage Using GCE Persistent Disk
15.7.1. Overview
OpenShift supports GCE Persistent Disk volumes (gcePD). You can provision your OpenShift cluster with persistent storage using GCE. Some familiarity with Kubernetes and GCE is assumed.
Before creating persistent volumes using GCE, OpenShift must first be properly configured for GCE Persistent Disk.
The Kubernetes persistent volume framework allows administrators to provision a cluster with persistent storage and gives users a way to request those resources without having any knowledge of the underlying infrastructure. Persistent volumes are not bound to a single project or namespace; they can be shared across the OpenShift cluster. Persistent volume claims, however, are specific to a project or namespace and can be requested by users.
High-availability of storage in the infrastructure is left to the underlying storage provider.
15.7.2. Provisioning
					Storage must exist in the underlying infrastructure before it can be mounted as a volume in OpenShift. After ensuring OpenShift is configured for GCE PersistentDisk, all that is required for Openshift and GCE is an GCE Persistent Disk volume ID and the PersistentVolume API.
				
15.7.2.1. Creating the Persistent Volume
You must define your persistent volume in an object definition before creating it in OpenShift:
Example 15.11. Persistent Volume Object Definition Using GCE
- 1
- The name of the volume. This will be how it is identified via persistent volume claims or from pods.
- 2
- The amount of storage allocated to this volume.
- 3
- This defines the volume type being used, in this case the gcePersistentDisk plug-in.
- 4
- File system type to mount.
- 5
- This is the GCE Persistent Disk volume that will be used.
							Changing the value of the fstype parameter after the volume has been formatted and provisioned can result in data loss and pod failure.
						
Save your definition to a file, for example gce-pv.yaml, and create the persistent volume:
oc create -f gce-pv.yaml
# oc create -f gce-pv.yaml
persistentvolume "pv0001" createdVerify that the persistent volume was created:
oc get pv
# oc get pv
NAME      LABELS    CAPACITY   ACCESSMODES   STATUS      CLAIM     REASON    AGE
pv0001    <none>    5Gi        RWO           Available                       2sUsers can then request storage using persistent volume claims, which can now utilize your new persistent volume.
Persistent volume claims only exist in the user’s namespace and can only be referenced by a pod within that same namespace. Any attempt to access a persistent volume from a different namespace causes the pod to fail.
15.7.2.2. Volume Format
						Before OpenShift mounts the volume and passes it to a container, it checks that it contains a file system as specified by the fsType parameter in the persistent volume definition. If the device is not formatted with the file system, all data from the device is erased and the device is automatically formatted with the given file system.
					
This allows using unformatted GCE volumes as persistent volumes, because OpenShift Enterprise formats them before the first use.
15.8. Persistent Storage Using iSCSI
15.8.1. Overview
You can provision your OpenShift cluster with persistent storage using iSCSI. Some familiarity with Kubernetes and iSCSI is assumed.
The Kubernetes persistent volume framework allows administrators to provision a cluster with persistent storage and gives users a way to request those resources without having any knowledge of the underlying infrastructure.
High-availability of storage in the infrastructure is left to the underlying storage provider.
15.8.2. Provisioning
					Storage must exist in the underlying infrastructure before it can be mounted as a volume in OpenShift. All that is required for iSCSI is iSCSI target portal, valid iSCSI IQN, valid LUN number, and filesystem type, and the PersistentVolume API.
				
Example 15.12. Persistent Volume Object Definition
15.8.2.1. Enforcing Disk Quotas
Use LUN partitions to enforce disk quotas and size constraints. Each LUN is one persistent volume. Kubernetes enforces unique names for persistent volumes.
Enforcing quotas in this way allows the end user to request persistent storage by a specific amount (e.g, 10Gi) and be matched with a corresponding volume of equal or greater capacity.
15.8.2.2. iSCSI Volume Security
						Users request storage with a PersistentVolumeClaim. This claim only lives in the user’s namespace and can only be referenced by a pod within that same namespace. Any attempt to access a persistent volume across a namespace causes the pod to fail.
					
Each iSCSI LUN must be accessible by all nodes in the cluster.
15.9. Persistent Storage Using Fibre Channel
15.9.1. Overview
You can provision your OpenShift cluster with persistent storage using Fibre Channel. Some familiarity with Kubernetes and Fibre Channel is assumed.
The Kubernetes persistent volume framework allows administrators to provision a cluster with persistent storage and gives users a way to request those resources without having any knowledge of the underlying infrastructure.
High-availability of storage in the infrastructure is left to the underlying storage provider.
15.9.2. Provisioning
					Storage must exist in the underlying infrastructure before it can be mounted as a volume in OpenShift. All that is required for Fibre Channel persistent storage is the targetWWNs (array of Fibre Channel target’s World Wide Names), a valid LUN number, and filesystem type, and the PersistentVolume API. Note, the number of LUNs must correspond to the number of Persistent Volumes that are created. In the example below, we have LUN as 2, therefore we have created two Persistent Volume definitions.
				
Example 15.13. Persistent Volumes Object Definition
						Changing the value of the fstype parameter after the volume has been formatted and provisioned can result in data loss and pod failure.
					
15.9.2.1. Enforcing Disk Quotas
Use LUN partitions to enforce disk quotas and size constraints. Each LUN is one persistent volume. Kubernetes enforces unique names for persistent volumes.
Enforcing quotas in this way allows the end user to request persistent storage by a specific amount (e.g, 10Gi) and be matched with a corresponding volume of equal or greater capacity.
15.9.2.2. Fibre Channel Volume Security
						Users request storage with a PersistentVolumeClaim. This claim only lives in the user’s namespace and can only be referenced by a pod within that same namespace. Any attempt to access a persistent volume across a namespace causes the pod to fail.
					
Each Fibre Channel LUN must be accessible by all nodes in the cluster.
15.10. Dynamically Provisioning Persistent Volumes
15.10.1. Overview
You can provision your OpenShift cluster with storage dynamically when running in a cloud environment. The Kubernetes persistent volume framework allows administrators to provision a cluster with persistent storage and gives users a way to request those resources without having any knowledge of the underlying infrastructure.
Many storage types are available for use as persistent volumes in OpenShift. While all of them can be statically provisioned by an administrator, some types of storage can be created dynamically using an API. These types of storage can be provisioned in an OpenShift cluster using the new and experimental dynamic storage feature.
Dynamic provisioning of persistent volumes is currently a Technology Preview feature, introduced in OpenShift Enterprise 3.1.1. This feature is experimental and expected to change in the future as it matures and feedback is received from users. New ways to provision the cluster are planned and the means by which one accesses this feature is going to change. Backwards compatibility is not guaranteed.
15.10.2. Enabling Provisioner Plug-ins
OpenShift provides the following provisioner plug-ins, which have generic implementations for dynamic provisioning that use the cluster’s configured cloud provider’s API to create new storage resources:
| Storage Type | Provisioner Plug-in Name | Required Cloud Configuration | Notes | 
|---|---|---|---|
| OpenStack Cinder | 
									 | ||
| AWS Elastic Block Store (EBS) | 
									 | 
									For dynamic provisioning when using multiple clusters in different zones, each node must be tagged with  | |
| GCE Persistent Disk (gcePD) | 
									 | 
									In multi-zone configurations, PVs must be created in the same region/zone as the master node. Do this by setting the  | 
For any chosen provisioner plug-ins, the relevant cloud configuration must also be set up, per Required Cloud Configuration in the above table.
When your OpenShift cluster is configured for EBS, GCE, or Cinder, the associated provisioner plug-in is implied and automatically enabled. No additional OpenShift configuration by the cluster administration is required for dynamic provisioning.
For example, if your OpenShift cluster is configured to run in AWS, the EBS provisioner plug-in is automatically available for creating dynamically provisioned storage requested by a user.
Future provisioner plug-ins will include the many types of storage a single provider offers. AWS, for example, has several types of EBS volumes to offer, each with its own performance characteristics; there is also an NFS-like storage option. More provisioner plug-ins will be implemented for the supported storage types available in OpenShift.
15.10.3. Requesting Dynamically Provisioned Storage
Users can request dynamically provisioned storage by including a storage class annotation in their persistent volume claim:
Example 15.14. Persistent Volume Claim Requesting Dynamic Storage
- 1
- The value of thevolume.alpha.kubernetes.io/storage-classannotation is not meaningful at this time. The presence of the annotation, with any arbitrary value, triggers provisioning using the single implied provisioner plug-in per cloud.
15.10.4. Volume Recycling
					Volumes created dynamically by a provisioner have their persistentVolumeReclaimPolicy set to Delete. When a persistent volume claim is deleted, its backing persistent volume is considered released of its claim, and that resource can be reclaimed by the cluster. Dynamic provisioning utilizes the provider’s API to delete the volume from the provider and then removes the persistent volume from the cluster.
				
15.11. Volume Security
15.11.1. Overview
This topic provides a general guide on pod security as it relates to volume security. For information on pod-level security in general, see Managing Security Context Constraints (SCC) and the Security Context Constraint concept topic. For information on the OpenShift persistent volume (PV) framework in general, see the Persistent Storage concept topic.
Accessing persistent storage requires coordination between the cluster and/or storage administrator and the end developer. The cluster administrator creates PVs, which abstract the underlying physical storage. The developer creates pods and, optionally, PVCs, which bind to PVs, based on matching criteria, such as capacity.
Multiple persistent volume claims (PVCs) within the same project can bind to the same PV. However, once a PVC binds to a PV, that PV cannot be bound by a claim outside of the first claim’s project. If the underlying storage needs to be accessed by multiple projects, then each project needs its own PV, which can point to the same physical storage. In this sense, a bound PV is tied to a project. For a detailed PV and PVC example, see the guide for WordPress and MySQL using NFS.
For the cluster administrator, granting pods access to PVs involves:
- knowing the group ID and/or user ID assigned to the actual storage,
- understanding SELinux considerations, and
- ensuring that these IDs are allowed in the range of legal IDs defined for the project and/or the SCC that matches the requirements of the pod.
					Group IDs, the user ID, and SELinux values are defined in the SecurityContext section in a pod definition. Group IDs are global to the pod and apply to all containers defined in the pod. User IDs can also be global, or specific to each container. Four sections control access to volumes:
				
15.11.2. SCCs, Defaults, and Allowed Ranges
					SCCs influence whether or not a pod is given a default user ID, fsGroup ID, supplemental group ID, and SELinux label. They also influence whether or not IDs supplied in the pod definition (or in the image) will be validated against a range of allowable IDs. If validation is required and fails, then the pod will also fail.
				
					SCCs define strategies, such as runAsUser, supplementalGroups, and fsGroup. These strategies help decide whether the pod is authorized. Strategy values set to RunAsAny are essentially stating that the pod can do what it wants regarding that strategy. Authorization is skipped for that strategy and no OpenShift default is produced based on that strategy. Therefore, IDs and SELinux labels in the resulting container are based on container defaults instead of OpenShift policies.
				
For a quick summary of RunAsAny:
- Any ID defined in the pod definition (or image) is allowed.
- Absence of an ID in the pod definition (and in the image) results in the container assigning an ID, which is root (0) for Docker.
- No SELinux labels are defined, so Docker will assign a unique label.
For these reasons, SCCs with RunAsAny for ID-related strategies should be protected so that ordinary developers do not have access to the SCC. On the other hand, SCC strategies set to MustRunAs or MustRunAsRange trigger ID validation (for ID-related strategies), and cause default values to be supplied by OpenShift to the container when those values are not supplied directly in the pod definition or image.
					SCCs may define the range of allowed IDs (user or groups). If range checking is required (for example, using MustRunAs) and the allowable range is not defined in the SCC, then the project determines the ID range. Therefore, projects support ranges of allowable ID. However, unlike SCCs, projects do not define strategies, such as runAsUser.
				
Allowable ranges are helpful not only because they define the boundaries for container IDs, but also because the minimum value in the range becomes the default value for the ID in question. For example, if the SCC ID strategy value is MustRunAs, the minimum value of an ID range is 100, and the ID is absent from the pod definition, then 100 is provided as the default for this ID.
					As part of pod admission, the SCCs available to a pod are examined (roughly, in priority order followed by most restrictive) to best match the requests of the pod. Setting a SCC’s strategy type to RunAsAny is less restrictive, whereas a type of MustRunAs is more restrictive. All of these strategies are evaluated. To see which SCC was assigned to a pod, use the oc get pod command:
				
- 1
- Name of the SCC that the pod used (in this case, a custom SCC).
- 2
- Name of the pod.
- 3
- Name of the project. "Namespace" is interchangeable with "project" in OpenShift. See Projects and Users for details.
It may not be immediately obvious which SCC was matched by a pod, so the command above can be very useful in understanding the UID, supplemental groups, and SELinux relabeling in a live container.
					Any SCC with a strategy set to RunAsAny allows specific values for that strategy to be defined in the pod definition (and/or image). When this applies to the user ID (runAsUser) it is prudent to restrict access to the SCC to prevent a container from being able to run as root.
				
Because pods often match the restricted SCC, it is worth knowing the security this entails. The restricted SCC has the following characteristics:
- 
							User IDs are constrained due to the runAsUserstrategy being set to MustRunAsRange. This forces user ID validation.
- 
							Because a range of allowable user IDs is not defined in the SCC (see oc export scc restrictedfor more details), the project’sopenshift.io/sa.scc.uid-rangerange will be used for range checking and for a default ID, if needed.
- 
							A default user ID is produced when a user ID is not specified in the pod definition due to runAsUserbeing set to MustRunAsRange.
- 
							An SELinux label is required (seLinuxContextset to MustRunAs), which uses the project’s default MCS label.
- 
							Arbitrary supplemental group IDs are allowed because no range checking is required. This is a result of both the supplementalGroupsandfsGroupstrategies being set to RunAsAny.
- Default supplemental groups are not produced for the running pod due to RunAsAny for the two group strategies above. Therefore, if no groups are defined in the pod definition (or in the image), the container(s) will have no supplemental groups predefined.
The following shows the default project and a custom SCC (my-custom-scc), which summarizes the interactions of the SCC and the project:
- 1
- default is the name of the project.
- 2
- Default values are only produced when the corresponding SCC strategy is not RunAsAny.
- 3
- SELinux default when not defined in the pod definition or in the SCC.
- 4
- Range of allowable group IDs. ID validation only occurs when the SCC strategy is RunAsAny. There can be more than one range specified, separated by commas. See below for supported formats.
- 5
- Same as <4> but for user IDs. Also, only a single range of user IDs is supported.
- 6 10
- MustRunAs enforces group ID range checking and provides the container’s groups default. Based on this SCC definition, the default is 5000 (the minimum ID value). If the range was omitted from the SCC, then the default would be 1000000000 (derived from the project). The other supported type, RunAsAny, does not perform range checking, thus allowing any group ID, and produces no default groups.
- 7
- MustRunAsRange enforces user ID range checking and provides a UID default. Based on this SCC, the default UID is 65534 (the minimum value). If the minimum *and maximum range were omitted from the SCC, the default user ID would be *1000000000 (derived from the project). *MustRunAsNonRoot and RunAsAny are *the other supported types. The range of allowed IDs can be defined to include *any user IDs required for the target storage.
- 8
- When set to MustRunAs, the container is created with the SCC’s SELinux options, or the MCS default defined in the project. A type of RunAsAny indicates that SELinux context is not required, and, if not defined in the pod, is not set in the container.
- 9
- The SELinux user name, role name, type, and labels can be defined here.
Two formats are supported for allowed ranges:
- 
							M/N, whereMis the starting ID andNis the count, so the range becomesMthrough (and including)M+N-1.
- 
							M-N, whereMis again the starting ID andNis the ending ID. The default group ID is the starting ID in the first range, which is1000000000in this project. If the SCC did not define a minimum group ID, then the project’s default ID is applied.
15.11.3. Supplemental Groups
Read SCCs, Defaults, and Allowed Ranges before working with supplemental groups.
					Supplemental groups are regular Linux groups. When a process runs in Linux, it has a UID, a GID, and one or more supplemental groups. These attributes can be set for a container’s main process. The supplementalGroups IDs are typically used for controlling access to shared storage, such as NFS and GlusterFS, whereas fsGroup is used for controlling access to block storage, such as Ceph RBD and iSCSI.
				
The OpenShift shared storage plug-ins mount volumes such that the POSIX permissions on the mount match the permissions on the target storage. For example, if the target storage’s owner ID is 1234 and its group ID is 5678, then the mount on the host node and in the container will have those same IDs. Therefore, the container’s main process must match one or both of those IDs in order to access the volume.
For example, consider the following NFS export.
On an OpenShift node:
						showmount requires access to the ports used by rpcbind and rpc.mount on the NFS server
					
showmount -e <nfs-server-ip-or-hostname>
# showmount -e <nfs-server-ip-or-hostname>
Export list for f21-nfs.vm:
/opt/nfs  *On the NFS server:
In the above, the owner is 65534 (nfsnobody), but the suggestions and examples in this topic apply to any non-root owner.
The /opt/nfs/ export is accessible by UID 65534 and the group 5555. In general, containers should not run as root, so in this NFS example, containers which are not run as UID 65534 or are not members the group 5555 will not be able to access the NFS export.
Often, the SCC matching the pod does not allow a specific user ID to be specified, thus using supplemental groups is a more flexible way to grant storage access to a pod. For example, to grant NFS access to the export above, the group 5555 can be defined in the pod definition:
- 1
- Name of the volume mount. Must match the name in thevolumessection.
- 2
- NFS export path as seen in the container.
- 3
- Pod global security context. Applies to all containers in the pod. Each container can also define itssecurityContext, however group IDs are global to the pod and cannot be defined for individual containers.
- 4
- Supplemental groups, which is an array of IDs, is set to 5555. This grants group access to the export.
- 5
- Name of the volume. Must match the name in thevolumeMountssection.
- 6
- Actual NFS export path on the NFS server.
					All containers in the above pod (assuming the matching SCC or project allows the group 5555) will be members of the group 5555 and have access to the volume, regardless of the container’s user ID. However, the assumption above is critical. Sometimes, the SCC does not define a range of allowable group IDs but requires group ID validation (due to supplementalGroups set to MustRunAs; note this is not the case for the restricted SCC). The project will not likely allow a group ID of 5555, unless the project has been customized for access to this NFS export. So in this scenario, the above pod will fail because its group ID of 5555 is not within the SCC’s or the project’s range of allowed group IDs.
				
Supplemental Groups and Custom SCCs
To remedy the situation in the previous example, a custom SCC can be created such that:
- a minimum and max group ID are defined,
- ID range checking is enforced, and
- the group ID of 5555 is allowed.
It is better to create new SCCs versus modifying a predefined SCC, or changing the range of allowed IDs in the predefined projects.
The easiest way to create a new SCC is to export an existing SCC and customize the YAML file to meet the requirements of the new SCC. For example:
- Use the restricted SCC as a template for the new SCC: - oc export scc restricted > new-scc.yaml - $ oc export scc restricted > new-scc.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Edit the new-scc.yaml file to your desired specifications.
- Create the new SCC: - oc create -f new-scc.yaml - $ oc create -f new-scc.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
						The oc edit scc command can be used to modify an instantiated SCC.
					
Here is a fragment of a new SCC named nfs-scc:
- 1
- Theallow*bools are the same as for the restricted SCC.
- 2
- Name of the new SCC.
- 3
- Numerically larger numbers have greater priority. Nil or omitted is the lowest priority. Higher priority SCCs sort before lower priority SCCs and thus have a better chance of matching a new pod.
- 4
- supplementalGroupsis a strategy and it is set to MustRunAs, which means group ID checking is required.
- 5
- Multiple ranges are supported. The allowed group ID range here is 5000 through 5999, with the default supplemental group being 5000.
When the same pod shown earlier runs against this new SCC (assuming, of course, the pod has access to the new SCC), it will start because the group 5555, supplied in the pod definition, is now allowed by the custom SCC.
15.11.4. fsGroup
Read SCCs, Defaults, and Allowed Ranges before working with supplemental groups.
					It is generally preferable to use group IDs (supplemental or fsGroup) to gain access to persistent storage versus using user IDs.
				
					fsGroup defines a pod’s "file system group" ID, which is added to the container’s supplemental groups. The supplementalGroups ID applies to shared storage, whereas the fsGroup ID is used for block storage.
				
Block storage, such as Ceph RBD, iSCSI, and various cloud storage, is typically dedicated to a single pod which has requested the block storage volume, either directly or using a PVC. Unlike shared storage, block storage is taken over by a pod, meaning that user and group IDs supplied in the pod definition (or image) are applied to the actual, physical block device. Typically, block storage is not shared.
					A fsGroup definition is shown below in the following pod definition fragment:
				
					As with supplementalGroups, all containers in the above pod (assuming the matching SCC or project allows the group 5555) will be members of the group 5555, and will have access to the block volume, regardless of the container’s user ID. If the pod matches the restricted SCC, whose fsGroup strategy is RunAsAny, then any fsGroup ID (including 5555) will be accepted. However, if the SCC has its fsGroup strategy set to MustRunAs, and 5555 is not in the allowable range of fsGroup IDs, then the pod will fail to run.
				
fsGroups and Custom SCCs
To remedy the situation in the previous example, a custom SCC can be created such that:
- a minimum and maximum group ID are defined,
- ID range checking is enforced, and
- the group ID of 5555 is allowed.
It is better to create new SCCs versus modifying a predefined SCC, or changing the range of allowed IDs in the predefined projects.
Consider the following fragment of a new SCC definition:
- 1
- MustRunAs triggers group ID range checking, whereas RunAsAny does not require range checking.
- 2
- The range of allowed group IDs is 5000 through, and including, 5999. Multiple ranges are supported. The allowed group ID range here is 5000 through 5999, with the defaultfsGroupbeing 5000.
- 3
- The minimum value (or the entire range) can be omitted from the SCC, and thus range checking and generating a default value will defer to the project’sopenshift.io/sa.scc.supplemental-groupsrange.fsGroupandsupplementalGroupsuse the same group field in the project; there is not a separate range forfsGroup.
When the pod shown above runs against this new SCC (assuming, of course, the pod has access to the new SCC), it will start because the group 5555, supplied in the pod definition, is allowed by the custom SCC. Additionally, the pod will "take over" the block device, so when the block storage is viewed by a process outside of the pod, it will actually have 5555 as its group ID.
Currently the list of volumes which support block ownership (block) management include:
- AWS Elastic Block Store
- OpenStack Cinder
- Ceph RBD
- GCE Persistent Disk
- iSCSI
- emptyDir
- gitRepo
15.11.5. User IDs
Read SCCs, Defaults, and Allowed Ranges before working with supplemental groups.
It is generally preferable to use group IDs (supplemental or fsGroup) to gain access to persistent storage versus using user IDs.
User IDs can be defined in the container image or in the pod definition. In the pod definition, a single user ID can be defined globally to all containers, or specific to individual containers (or both). A user ID is supplied as shown in the pod definition fragment below:
spec:
  containers:
  - name: ...
    securityContext:
      runAsUser: 65534
spec:
  containers:
  - name: ...
    securityContext:
      runAsUser: 65534
					ID 65534 in the above is container-specific and matches the owner ID on the export. If the NFS export’s owner ID was 54321, then that number would be used in the pod definition. Specifying securityContext outside of the container definition makes the ID global to all containers in the pod.
				
					Similar to group IDs, user IDs may be validated according to policies set in the SCC and/or project. If the SCC’s runAsUser strategy is set to RunAsAny, then any user ID defined in the pod definition or in the image is allowed.
				
This means even a UID of 0 (root) is allowed.
					If, instead, the runAsUser strategy is set to MustRunAsRange, then a supplied user ID will be validated against a range of allowed IDs. If the pod supplies no user ID, then the default ID is the minimum value of the range of allowable user IDs.
				
Returning to the earlier NFS example, the container needs its UID set to 65534, which is shown in the pod fragment above. Assuming the default project and the restricted SCC, the pod’s requested user ID of 65534 will not be allowed, and therefore the pod will fail. The pod fails because:
- it requests 65534 as its user ID,
- 
							all available SCCs use MustRunAsRange for their runAsUserstrategy, so UID range checking is required, and
- 65534 is not included in the SCC or project’s user ID range.
To address this situation, the recommended path would be to create a new SCC with the appropriate user ID range. A new project could also be created with the appropriate user ID range defined. There are other, less-preferred options:
- The restricted SCC could be modified to include 65534 within its minimum and maximum user ID range. This is not recommended as you should avoid modifying the predefined SCCs if possible.
- 
							The restricted SCC could be modified to use RunAsAny for the runAsUservalue, thus eliminating ID range checking. This is strongly not recommended, as containers could run as root.
- The default project’s UID range could be changed to allow a user ID of 65534. This is not generally advisable because only a single range of user IDs can be specified.
User IDs and Custom SCCs
It is good practice to avoid modifying the predefined SCCs if possible. The preferred approach is to create a custom SCC that better fits an organization’s security needs, or create a new project that supports the desired user IDs.
To remedy the situation in the previous example, a custom SCC can be created such that:
- a minimum and maximum user ID is defined,
- UID range checking is still enforced, and
- the UID of 65534 will be allowed.
For example:
- 1
- Theallow*bools are the same as for the restricted SCC.
- 2
- The name of this new SCC is nfs-scc.
- 3
- Numerically larger numbers have greater priority. Nil or omitted is the lowest priority. Higher priority SCCs sort before lower priority SCCs, and thus have a better chance of matching a new pod.
- 4
- TherunAsUserstrategy is set to MustRunAsRange, which means UID range checking is enforced.
- 5
- The UID range is 65534 through 65534 (a range of one value).
					Now, with runAsUser: 65534 shown in the previous pod definition fragment, the pod matches the new nfs-scc and is able to run with a UID of 65534.
				
15.11.6. SELinux Options
					All predefined SCCs, except for the privileged SCC, set the seLinuxContext to MustRunAs. So the SCCs most likely to match a pod’s requirements will force the pod to use an SELinux policy. The SELinux policy used by the pod can be defined in the pod itself, in the image, in the SCC, or in the project (which provides the default).
				
					SELinux labels can be defined in a pod’s securityContext.seLinuxOptions section, and supports user, role, type, and level:
				
Level and MCS label are used interchangeably in this topic.
...
 securityContext: 
    seLinuxOptions:
      level: "s0:c123,c456" 
...
...
 securityContext: 
    seLinuxOptions:
      level: "s0:c123,c456" 
...Here are fragments from an SCC and from the default project:
					All predefined SCCs, except for the privileged SCC, set the seLinuxContext to MustRunAs. This forces pods to use MCS labels, which can be defined in the pod definition, the image, or provided as a default.
				
					The SCC determines whether or not to require an SELinux label and can provide a default label. If the seLinuxContext strategy is set to MustRunAs and the pod (or image) does not define a label, OpenShift defaults to a label chosen from the SCC itself or from the project.
				
					If seLinuxContext is set to RunAsAny, then no default labels are provided, and the container determines the final label. In the case of Docker, the container will use a unique MCS label, which will not likely match the labeling on existing storage mounts. Volumes which support SELinux management will be relabeled so that they are accessible by the specified label and, depending on how exclusionary the label is, only that label.
				
This means two things for unprivileged containers:
- 
							The volume will be given a typewhich is accessible by unprivileged containers. Thistypeis usually svirt_sandbox_file_t.
- 
							If a levelis specified, the volume will be labeled with the given MCS label.
For a volume to be accessible by a pod, the pod must have both categories of the volume. So a pod with s0:c1,c2 will be able to access a volume with s0:c1,c2. A volume with s0 will be accessible by all pods.
If pods fail authorization, or if the storage mount is failing due to permissions errors, then there is a possibility that SELinux enforcement is interfering. One way to check for this is to run:
ausearch -m avc --start recent
# ausearch -m avc --start recentThis examines the log file for AVC (Access Vector Cache) errors.