Chapter 7. About GPU time slicing
GPU time slicing enables multiple workloads to share a single physical GPU by dividing processing time in short, alternating time slots. This method improves resource utilization, reduces idle GPU time, and allows multiple users to run AI/ML workloads concurrently in OpenShift AI. The NVIDIA GPU Operator manages this scheduling based on a time-slicing-config
ConfigMap that defines the number of GPU slices for each physical GPU.
Time-slicing differs from Multi-Instance GPU (MIG) partitioning. While MIG provides memory and fault isolation, time-slicing shares the same GPU memory across workloads without strict isolation. Time-slicing is ideal for lightweight inference tasks, data preprocessing, and other scenarios where full GPU isolation is unnecessary.
Consider the following points when using GPU time slicing:
- Memory sharing: All workloads share GPU memory. High memory usage by one workload can impact others.
- Performance trade-offs: While time slicing allows multiple workloads to share a GPU, it does not provide strict resource isolation like MIG.
- GPU compatibility: Time slicing is supported on specific NVIDIA GPUs.
7.1. Enabling GPU time slicing
To enable GPU time slicing in OpenShift AI, you must configure the NVIDIA GPU Operator to allow multiple workloads to share a single GPU.
Prerequisites
- You have logged in to OpenShift.
-
You have the
cluster-admin
role in OpenShift. - You have installed and configured the NVIDIA GPU Operator.
- The relevant nodes in your deployment contain NVIDIA GPUs.
- The GPU in your deployment supports time slicing.
-
You installed the OpenShift command line interface (
oc
) as described in Installing the OpenShift CLI.
Procedure
Create a config map named
time-slicing-config
in the namespace that is used by the GPU operator. For NVIDIA GPUs, this is thenvidia-gpu-operator
namespace.- Log in to the OpenShift web console as a cluster administrator.
-
In the Administrator perspective, navigate to Workloads
ConfigMaps. - On the ConfigMap details page, click the Create Config Map button.
- On the Create Config Map page, for Configure via, select YAML view.
In the Data field, enter the YAML code for the relevant GPU. Here is an example of a
time-slicing-config
config map for an NVIDIA T4 GPU:Note- You can change the number of replicas to control the number of GPU slices available for each physical GPU.
- Increasing replicas might increase the risk of Out of Memory (OOM) errors if workloads exceed available GPU memory.
apiVersion: v1 kind: ConfigMap metadata: name: time-slicing-config data: tesla-t4: |- version: v1 flags: migStrategy: none sharing: timeSlicing: renameByDefault: false failRequestsGreaterThanOne: false resources: - name: nvidia.com/gpu replicas: 4
apiVersion: v1 kind: ConfigMap metadata: name: time-slicing-config data: tesla-t4: |- version: v1 flags: migStrategy: none sharing: timeSlicing: renameByDefault: false failRequestsGreaterThanOne: false resources: - name: nvidia.com/gpu replicas: 4
Copy to Clipboard Copied! - Click Create.
Update the
gpu-cluster-policy
cluster policy to reference thetime-slicing-config
config map:-
In the Administrator perspective, navigate to Operators
Installed Operators. - Search for the NVIDIA GPU Operator, and then click the Operator name to open the Operator details page.
- Click the ClusterPolicy tab.
-
Select the
gpu-cluster-policy
resource from the list to open the ClusterPolicy details page. Click the YAML tab and update the
spec.devicePlugin
section to reference thetime-slicing-config
config map. Here is an example of agpu-cluster-policy
cluster policy for an NVIDIA T4 GPU:apiVersion: nvidia.com/v1 kind: ClusterPolicy metadata: name: gpu-cluster-policy spec: devicePlugin: config: default: tesla-t4 name: time-slicing-config
apiVersion: nvidia.com/v1 kind: ClusterPolicy metadata: name: gpu-cluster-policy spec: devicePlugin: config: default: tesla-t4 name: time-slicing-config
Copy to Clipboard Copied! - Click Save.
-
In the Administrator perspective, navigate to Operators
Label the relevant machine set to apply time slicing:
-
In the Administrator perspective, navigate to Compute
Machine Sets. - Select the machine set for GPU time slicing from the list.
On the MachineSet details page, click the YAML tab and update the
spec.template.spec.metadata.labels
section to label the relevant machine set. Here is an example of a machine set with the appropriate machine label for an NVIDIA T4 GPU:spec: template: spec: metadata: labels: nvidia.com/device-plugin.config: tesla-t4
spec: template: spec: metadata: labels: nvidia.com/device-plugin.config: tesla-t4
Copy to Clipboard Copied! - Click Save.
-
In the Administrator perspective, navigate to Compute
Verification
- Log in to the OpenShift CLI.
Verify that you have applied the config map correctly:
oc get configmap time-slicing-config -n nvidia-gpu-operator -o yaml
oc get configmap time-slicing-config -n nvidia-gpu-operator -o yaml
Copy to Clipboard Copied! Check that the cluster policy includes the time-slicing configuration:
oc get clusterpolicy gpu-cluster-policy -o yaml
oc get clusterpolicy gpu-cluster-policy -o yaml
Copy to Clipboard Copied! Ensure that the label is applied to nodes:
oc get nodes --show-labels | grep nvidia.com/device-plugin.config
oc get nodes --show-labels | grep nvidia.com/device-plugin.config
Copy to Clipboard Copied!
If workloads do not appear to be sharing the GPU, verify that the NVIDIA device plugin is running and that the correct labels are applied.