Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 2. Leader Worker Set Operator
2.1. Leader Worker Set Operator overview Copier lienLien copié sur presse-papiers!
Using large language models (LLMs) for AI/ML inference often requires significant compute resources, and workloads typically must be sharded across multiple nodes. This can make deployments complex, creating challenges around scaling, recovery from failures, and efficient pod placement.
The Leader Worker Set Operator simplifies these multi-node deployments by treating a group of pods as a single, coordinated unit. It manages the lifecycle of each pod in the group, scales the entire group together, and performs updates and failure recovery at the group level to ensure consistency.
2.1.1. About the Leader Worker Set Operator Copier lienLien copié sur presse-papiers!
The Leader Worker Set Operator is based on the LeaderWorkerSet open source project. LeaderWorkerSet
is a custom Kubernetes API that can be used to deploy a group of pods as a unit. This is useful for artificial intelligence (AI) and machine learning (ML) inference workloads, where large language models (LLMs) are sharded across multiple nodes.
With the LeaderWorkerSet
API, pods are grouped into units consisting of one leader and multiple workers, all managed together as a single entity. Each pod in a group has a unique pod identity. Pods within a group are created in parallel and share identical lifecycle stages. Rollouts, rolling updates, and pod failure restarts are performed as a group.
In the LeaderWorkerSet
configuration, you define the size of the groups and the number of group replicas. If necessary, you can define separate templates for leader and worker pods, allowing for role-specific customization. You can also configure topology-aware placement, so that pods in the same group are co-located in the same topology.
Before you install the Leader Worker Set Operator, you must install the cert-manager Operator for Red Hat OpenShift because it is required to configure services and manage metrics collection.
Monitoring for the Leader Worker Set Operator is provided by default with OpenShift Container Platform through Prometheus.
2.1.1.1. LeaderWorkerSet architecture Copier lienLien copié sur presse-papiers!
The following diagram shows how the LeaderWorkerSet
API organizes groups of pods into a single unit, with one pod as the leader and the rest as the workers, to coordinate distributed workloads:
Figure 2.1. Leader worker set architecture
The LeaderWorkerSet
API uses a leader stateful set to manage the deployment and lifecycle of the groups of pods. For each replica defined, a leader-worker group is created.
Each leader-worker group contains a leader pod and a worker stateful set. The worker stateful set is owned by the leader pod and manages the set of worker pods associated with that leader pod. The specified size defines the total number of pods in each leader-worker group, with the leader pod included in that number.
2.2. Leader Worker Set Operator release notes Copier lienLien copié sur presse-papiers!
You can use the Leader Worker Set Operator to manage distributed inference workloads and process large-scale inference requests efficiently.
These release notes track the development of the Leader Worker Set Operator.
For more information, see About the Leader Worker Set Operator.
2.2.1. Release notes for Leader Worker Set Operator 1.0.0 Copier lienLien copié sur presse-papiers!
Issued: 18 September 2025
The following advisories are available for the Leader Worker Set Operator 1.0.0:
2.2.1.1. New features and enhancements Copier lienLien copié sur presse-papiers!
- This is the initial release of the Leader Worker Set Operator.
2.3. Managing distributed workloads with the Leader Worker Set Operator Copier lienLien copié sur presse-papiers!
You can use the Leader Worker Set Operator to manage distributed inference workloads and process large-scale inference requests efficiently.
2.3.1. Installing the Leader Worker Set Operator Copier lienLien copié sur presse-papiers!
You can use the web console to install the Leader Worker Set Operator.
Prerequisites
-
You have access to the cluster with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
- You have installed the cert-manager Operator for Red Hat OpenShift.
Procedure
- Log in to the OpenShift Container Platform web console.
- Verify that the cert-manager Operator for Red Hat OpenShift is installed.
Install the Leader Worker Set Operator.
-
Navigate to Operators
OperatorHub. - Enter Leader Worker Set Operator into the filter box.
- Select the Leader Worker Set Operator and click Install.
On the Install Operator page:
- The Update channel is set to stable-v1.0, which installs the latest stable release of Leader Worker Set Operator 1.0.
- Under Installation mode, select A specific namespace on the cluster.
- Under Installed Namespace, select Operator recommended Namespace: openshift-lws-operator.
Under Update approval, select one of the following update strategies:
- The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
- The Manual strategy requires a user with appropriate credentials to approve the Operator update.
- Click Install.
-
Navigate to Operators
Create the custom resource (CR) for the Leader Worker Set Operator:
-
Navigate to Installed Operators
Leader Worker Set Operator. - Under Provided APIs, click Create instance in the LeaderWorkerSetOperator pane.
- Click Create.
-
Navigate to Installed Operators
2.3.2. Deploying a leader worker set Copier lienLien copié sur presse-papiers!
You can use the Leader Worker Set Operator to deploy a leader worker set to assist with managing distributed workloads across nodes.
Prerequisites
- You have installed the Leader Worker Set Operator.
Procedure
Create a new project by running the following command:
oc new-project my-namespace
$ oc new-project my-namespace
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a file named
leader-worker-set.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the name of the leader worker set resource.
- 2
- Specify the namespace for the leader worker set to run in.
- 3
- Specify the pod template for the leader pods.
- 4
- Specify the restart policy for when pod failures occur. Allowed values are
RecreateGroupOnPodRestart
to restart the whole group orNone
to not restart the group. - 5
- Specify the number of pods to create for each group, including the leader pod. For example, a value of
3
creates 1 leader pod and 2 worker pods. The default value is1
. - 6
- Specify the pod template for the worker pods.
- 7
- Specify the policy to use when creating the headless service. Allowed values are
UniquePerReplica
orShared
. The default value isShared
. - 8
- Specify the number of replicas, or leader-worker groups. The default value is
1
. - 9
- Specify the maximum number of replicas that can be scheduled above the
replicas
value during rolling updates. The value can be specified as an integer or a percentage.
For more information on all available fields to configure, see LeaderWorkerSet API upstream documentation.
Apply the leader worker set configuration by running the following command:
oc apply -f leader-worker-set.yaml
$ oc apply -f leader-worker-set.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that pods were created by running the following command:
oc get pods -n my-namespace
$ oc get pods -n my-namespace
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Review the stateful sets by running the following command:
oc get statefulsets
$ oc get statefulsets
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY AGE my-lws 4/4 111s my-lws-0 2/2 57s my-lws-1 2/2 60s
NAME READY AGE my-lws 4/4 111s
1 my-lws-0 2/2 57s
2 my-lws-1 2/2 60s
3 Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.4. Uninstalling the Leader Worker Set Operator Copier lienLien copié sur presse-papiers!
You can remove the Leader Worker Set Operator from OpenShift Container Platform by uninstalling the Operator and removing its related resources.
2.4.1. Uninstalling the Leader Worker Set Operator Copier lienLien copié sur presse-papiers!
You can use the web console to uninstall the Leader Worker Set Operator.
Prerequisites
-
You have access to the cluster with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
- You have installed the Leader Worker Set Operator.
Procedure
- Log in to the OpenShift Container Platform web console.
-
Navigate to Operators
Installed Operators. -
Select
openshift-lws-operator
from the Project dropdown list. Delete the
LeaderWorkerSetOperator
instance.- Click Leader Worker Set Operator and select the LeaderWorkerSetOperator tab.
-
Click the Options menu
next to the cluster entry and select Delete LeaderWorkerSetOperator.
- In the confirmation dialog, click Delete.
Uninstall the Leader Worker Set Operator.
-
Navigate to Operators
Installed Operators. -
Click the Options menu
next to the Leader Worker Set Operator entry and click Uninstall Operator.
- In the confirmation dialog, click Uninstall.
-
Navigate to Operators
2.4.2. Uninstalling Leader Worker Set Operator resources Copier lienLien copié sur presse-papiers!
Optionally, after uninstalling the Leader Worker Set Operator, you can remove its related resources from your cluster.
Prerequisites
-
You have access to the cluster with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
- You have uninstalled the Leader Worker Set Operator.
Procedure
- Log in to the OpenShift Container Platform web console.
Remove CRDs that were created when the Leader Worker Set Operator was installed:
-
Navigate to Administration
CustomResourceDefinitions. -
Enter
LeaderWorkerSetOperator
in the Name field to filter the CRDs. -
Click the Options menu
next to the LeaderWorkerSetOperator CRD and select Delete CustomResourceDefinition.
- In the confirmation dialog, click Delete.
-
Navigate to Administration
Delete the
openshift-lws-operator
namespace.-
Navigate to Administration
Namespaces. -
Enter
openshift-lws-operator
into the filter box. -
Click the Options menu
next to the openshift-lws-operator entry and select Delete Namespace.
-
In the confirmation dialog, enter
openshift-lws-operator
and click Delete.
-
Navigate to Administration