Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 2. Leader Worker Set Operator


2.1. Leader Worker Set Operator overview

Using large language models (LLMs) for AI/ML inference often requires significant compute resources, and workloads typically must be sharded across multiple nodes. This can make deployments complex, creating challenges around scaling, recovery from failures, and efficient pod placement.

The Leader Worker Set Operator simplifies these multi-node deployments by treating a group of pods as a single, coordinated unit. It manages the lifecycle of each pod in the group, scales the entire group together, and performs updates and failure recovery at the group level to ensure consistency.

2.1.1. About the Leader Worker Set Operator

The Leader Worker Set Operator is based on the LeaderWorkerSet open source project. LeaderWorkerSet is a custom Kubernetes API that can be used to deploy a group of pods as a unit. This is useful for artificial intelligence (AI) and machine learning (ML) inference workloads, where large language models (LLMs) are sharded across multiple nodes.

With the LeaderWorkerSet API, pods are grouped into units consisting of one leader and multiple workers, all managed together as a single entity. Each pod in a group has a unique pod identity. Pods within a group are created in parallel and share identical lifecycle stages. Rollouts, rolling updates, and pod failure restarts are performed as a group.

In the LeaderWorkerSet configuration, you define the size of the groups and the number of group replicas. If necessary, you can define separate templates for leader and worker pods, allowing for role-specific customization. You can also configure topology-aware placement, so that pods in the same group are co-located in the same topology.

Important

Before you install the Leader Worker Set Operator, you must install the cert-manager Operator for Red Hat OpenShift because it is required to configure services and manage metrics collection.

Monitoring for the Leader Worker Set Operator is provided by default with OpenShift Container Platform through Prometheus.

2.1.1.1. LeaderWorkerSet architecture

The following diagram shows how the LeaderWorkerSet API organizes groups of pods into a single unit, with one pod as the leader and the rest as the workers, to coordinate distributed workloads:

Figure 2.1. Leader worker set architecture

The LeaderWorkerSet API uses a leader stateful set to manage the deployment and lifecycle of the groups of pods. For each replica defined, a leader-worker group is created.

Each leader-worker group contains a leader pod and a worker stateful set. The worker stateful set is owned by the leader pod and manages the set of worker pods associated with that leader pod. The specified size defines the total number of pods in each leader-worker group, with the leader pod included in that number.

2.2. Leader Worker Set Operator release notes

You can use the Leader Worker Set Operator to manage distributed inference workloads and process large-scale inference requests efficiently.

These release notes track the development of the Leader Worker Set Operator.

For more information, see About the Leader Worker Set Operator.

2.2.1. Release notes for Leader Worker Set Operator 1.0.0

Issued: 18 September 2025

The following advisories are available for the Leader Worker Set Operator 1.0.0:

2.2.1.1. New features and enhancements

  • This is the initial release of the Leader Worker Set Operator.

You can use the Leader Worker Set Operator to manage distributed inference workloads and process large-scale inference requests efficiently.

2.3.1. Installing the Leader Worker Set Operator

You can use the web console to install the Leader Worker Set Operator.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.
  • You have installed the cert-manager Operator for Red Hat OpenShift.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Verify that the cert-manager Operator for Red Hat OpenShift is installed.
  3. Install the Leader Worker Set Operator.

    1. Navigate to Operators OperatorHub.
    2. Enter Leader Worker Set Operator into the filter box.
    3. Select the Leader Worker Set Operator and click Install.
    4. On the Install Operator page:

      1. The Update channel is set to stable-v1.0, which installs the latest stable release of Leader Worker Set Operator 1.0.
      2. Under Installation mode, select A specific namespace on the cluster.
      3. Under Installed Namespace, select Operator recommended Namespace: openshift-lws-operator.
      4. Under Update approval, select one of the following update strategies:

        • The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
        • The Manual strategy requires a user with appropriate credentials to approve the Operator update.
      5. Click Install.
  4. Create the custom resource (CR) for the Leader Worker Set Operator:

    1. Navigate to Installed Operators Leader Worker Set Operator.
    2. Under Provided APIs, click Create instance in the LeaderWorkerSetOperator pane.
    3. Click Create.

2.3.2. Deploying a leader worker set

You can use the Leader Worker Set Operator to deploy a leader worker set to assist with managing distributed workloads across nodes.

Prerequisites

  • You have installed the Leader Worker Set Operator.

Procedure

  1. Create a new project by running the following command:

    $ oc new-project my-namespace
    Copy to Clipboard Toggle word wrap
  2. Create a file named leader-worker-set.yaml

    apiVersion: leaderworkerset.x-k8s.io/v1
    kind: LeaderWorkerSet
    metadata:
      generation: 1
      name: my-lws 
    1
    
      namespace: my-namespace 
    2
    
    spec:
      leaderWorkerTemplate:
        leaderTemplate: 
    3
    
          metadata: {}
          spec:
            containers:
            - image: nginxinc/nginx-unprivileged:1.27
              name: leader
              resources: {}
        restartPolicy: RecreateGroupOnPodRestart 
    4
    
        size: 3 
    5
    
        workerTemplate: 
    6
    
          metadata: {}
          spec:
            containers:
            - image: nginxinc/nginx-unprivileged:1.27
              name: worker
              ports:
              - containerPort: 8080
                protocol: TCP
              resources: {}
      networkConfig:
        subdomainPolicy: Shared 
    7
    
      replicas: 2 
    8
    
      rolloutStrategy:
        rollingUpdateConfiguration:
          maxSurge: 1 
    9
    
          maxUnavailable: 1
        type: RollingUpdate
      startupPolicy: LeaderCreated
    Copy to Clipboard Toggle word wrap
    1
    Specify the name of the leader worker set resource.
    2
    Specify the namespace for the leader worker set to run in.
    3
    Specify the pod template for the leader pods.
    4
    Specify the restart policy for when pod failures occur. Allowed values are RecreateGroupOnPodRestart to restart the whole group or None to not restart the group.
    5
    Specify the number of pods to create for each group, including the leader pod. For example, a value of 3 creates 1 leader pod and 2 worker pods. The default value is 1.
    6
    Specify the pod template for the worker pods.
    7
    Specify the policy to use when creating the headless service. Allowed values are UniquePerReplica or Shared. The default value is Shared.
    8
    Specify the number of replicas, or leader-worker groups. The default value is 1.
    9
    Specify the maximum number of replicas that can be scheduled above the replicas value during rolling updates. The value can be specified as an integer or a percentage.

    For more information on all available fields to configure, see LeaderWorkerSet API upstream documentation.

  3. Apply the leader worker set configuration by running the following command:

    $ oc apply -f leader-worker-set.yaml
    Copy to Clipboard Toggle word wrap

Verification

  1. Verify that pods were created by running the following command:

    $ oc get pods -n my-namespace
    Copy to Clipboard Toggle word wrap

    Example output

    NAME         READY   STATUS    RESTARTS   AGE
    my-lws-0     1/1     Running   0          4s 
    1
    
    my-lws-0-1   1/1     Running   0          3s
    my-lws-0-2   1/1     Running   0          3s
    my-lws-1     1/1     Running   0          7s 
    2
    
    my-lws-1-1   1/1     Running   0          6s
    my-lws-1-2   1/1     Running   0          6s
    Copy to Clipboard Toggle word wrap

    1
    The leader pod for the first group.
    2
    The leader pod for the second group.
  2. Review the stateful sets by running the following command:

    $ oc get statefulsets
    Copy to Clipboard Toggle word wrap

    Example output

    NAME       READY   AGE
    my-lws     4/4     111s 
    1
    
    my-lws-0   2/2     57s 
    2
    
    my-lws-1   2/2     60s 
    3
    Copy to Clipboard Toggle word wrap

    1
    The leader stateful set for all leader-worker groups.
    2
    The worker stateful set for the first group.
    3
    The worker stateful set for the second group.

2.4. Uninstalling the Leader Worker Set Operator

You can remove the Leader Worker Set Operator from OpenShift Container Platform by uninstalling the Operator and removing its related resources.

2.4.1. Uninstalling the Leader Worker Set Operator

You can use the web console to uninstall the Leader Worker Set Operator.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.
  • You have installed the Leader Worker Set Operator.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Navigate to Operators Installed Operators.
  3. Select openshift-lws-operator from the Project dropdown list.
  4. Delete the LeaderWorkerSetOperator instance.

    1. Click Leader Worker Set Operator and select the LeaderWorkerSetOperator tab.
    2. Click the Options menu kebab next to the cluster entry and select Delete LeaderWorkerSetOperator.
    3. In the confirmation dialog, click Delete.
  5. Uninstall the Leader Worker Set Operator.

    1. Navigate to Operators Installed Operators.
    2. Click the Options menu kebab next to the Leader Worker Set Operator entry and click Uninstall Operator.
    3. In the confirmation dialog, click Uninstall.

2.4.2. Uninstalling Leader Worker Set Operator resources

Optionally, after uninstalling the Leader Worker Set Operator, you can remove its related resources from your cluster.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.
  • You have uninstalled the Leader Worker Set Operator.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Remove CRDs that were created when the Leader Worker Set Operator was installed:

    1. Navigate to Administration CustomResourceDefinitions.
    2. Enter LeaderWorkerSetOperator in the Name field to filter the CRDs.
    3. Click the Options menu kebab next to the LeaderWorkerSetOperator CRD and select Delete CustomResourceDefinition.
    4. In the confirmation dialog, click Delete.
  3. Delete the openshift-lws-operator namespace.

    1. Navigate to Administration Namespaces.
    2. Enter openshift-lws-operator into the filter box.
    3. Click the Options menu kebab next to the openshift-lws-operator entry and select Delete Namespace.
    4. In the confirmation dialog, enter openshift-lws-operator and click Delete.
Retour au début
Red Hat logoGithubredditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance. Découvrez nos récentes mises à jour.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez le Blog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

Theme

© 2025 Red Hat