Questo contenuto non è disponibile nella lingua selezionata.

Chapter 3. Leader Worker Set Operator


3.1. Leader Worker Set Operator overview

Use the Leader Worker Set Operator to manage multi-node AI/ML inference deployments efficiently. The Leader Worker Set Operator treats groups of pods as one unit to simplify scaling, recovery, and updates for large workloads.

Using large language models (LLMs) for AI/ML inference often requires significant compute resources, and workloads typically must be sharded across multiple nodes. This can make deployments complex, creating challenges around scaling, recovery from failures, and efficient pod placement.

The Leader Worker Set Operator simplifies these multi-node deployments by treating a group of pods as a single, coordinated unit. It manages the lifecycle of each pod in the group, scales the entire group together, and performs updates and failure recovery at the group level to ensure consistency.

3.1.1. About the Leader Worker Set Operator

Use the Leader Worker Set Operator to deploy groups of pods as a single, manageable unit. This helps you to deploy large AI/ML inference workloads, such as sharded large language models (LLMs).

The Leader Worker Set Operator is based on the LeaderWorkerSet open source project. LeaderWorkerSet is a custom Kubernetes API that can be used to deploy a group of pods as a unit. This is useful for artificial intelligence (AI) and machine learning (ML) inference workloads, where large language models (LLMs) are sharded across multiple nodes.

With the LeaderWorkerSet API, pods are grouped into units consisting of one leader and multiple workers, all managed together as a single entity. Each pod in a group has a unique pod identity. Pods within a group are created in parallel and share identical lifecycle stages. Rollouts, rolling updates, and pod failure restarts are performed as a group.

In the LeaderWorkerSet configuration, you define the size of the groups and the number of group replicas. If necessary, you can define separate templates for leader and worker pods, allowing for role-specific customization. You can also configure topology-aware placement, so that pods in the same group are co-located in the same topology.

Important

Before you install the Leader Worker Set Operator, you must install the cert-manager Operator for Red Hat OpenShift because it is required to configure services and manage metrics collection.

Monitoring for the Leader Worker Set Operator is provided by default with OpenShift Container Platform through Prometheus.

3.1.1.1. LeaderWorkerSet architecture

Review the LeaderWorkerSet architecture to learn how the LeaderWorkerSet API organizes groups of pods into a single unit, with one pod as the leader and the rest as the workers, to coordinate distributed workloads.

The following diagram describes the LeaderWorkerSet architecture:

Figure 3.1. Leader worker set architecture

The LeaderWorkerSet API uses a leader stateful set to manage the deployment and lifecycle of the groups of pods. For each replica defined, a leader-worker group is created.

Each leader-worker group contains a leader pod and a worker stateful set. The worker stateful set is owned by the leader pod and manages the set of worker pods associated with that leader pod. The specified size defines the total number of pods in each leader-worker group, with the leader pod included in that number.

3.2. Leader Worker Set Operator release notes

Review the Leader Worker Set Operator release notes to track its development and learn what is new and changed with each release.

You can use the Leader Worker Set Operator to manage distributed inference workloads and process large-scale inference requests efficiently.

These release notes track the development of the Leader Worker Set Operator.

For more information, see About the Leader Worker Set Operator.

3.2.1. Release notes for Leader Worker Set Operator 1.0.0

Review the release notes for Leader Worker Set Operator 1.0.0 to learn what is new and updated with this release.

Issued: 18 September 2025

The following advisories are available for the Leader Worker Set Operator 1.0.0:

3.2.1.1. New features and enhancements

  • This is the initial release of the Leader Worker Set Operator.

3.3. Managing distributed workloads with the Leader Worker Set Operator

You can use the Leader Worker Set Operator to manage distributed inference workloads and process large-scale inference requests efficiently.

3.3.1. Installing the Leader Worker Set Operator

You can install the Leader Worker Set Operator through the OpenShift Container Platform web console to begin managing distributed AI workloads.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.
  • You have installed the cert-manager Operator for Red Hat OpenShift.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Verify that the cert-manager Operator for Red Hat OpenShift is installed.
  3. Install the Leader Worker Set Operator.

    1. Navigate to Operators OperatorHub.
    2. Enter Leader Worker Set Operator into the filter box.
    3. Select the Leader Worker Set Operator and click Install.
    4. On the Install Operator page:

      1. The Update channel is set to stable-v1.0, which installs the latest stable release of Leader Worker Set Operator 1.0.
      2. Under Installation mode, select A specific namespace on the cluster.
      3. Under Installed Namespace, select Operator recommended Namespace: openshift-lws-operator.
      4. Under Update approval, select one of the following update strategies:

        • The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
        • The Manual strategy requires a user with appropriate credentials to approve the Operator update.
      5. Click Install.
  4. Create the custom resource (CR) for the Leader Worker Set Operator:

    1. Navigate to Installed Operators Leader Worker Set Operator.
    2. Under Provided APIs, click Create instance in the LeaderWorkerSetOperator pane.
    3. Click Create.

3.3.2. Deploying a leader worker set

You can use the Leader Worker Set Operator to deploy a leader worker set to assist with managing distributed workloads across nodes.

Prerequisites

  • You have installed the Leader Worker Set Operator.

Procedure

  1. Create a new project by running the following command:

    $ oc new-project my-namespace
    Copy to Clipboard Toggle word wrap
  2. Create a file named leader-worker-set.yaml

    apiVersion: leaderworkerset.x-k8s.io/v1
    kind: LeaderWorkerSet
    metadata:
      generation: 1
      name: my-lws
      namespace: my-namespace
    spec:
      leaderWorkerTemplate:
        leaderTemplate:
          metadata: {}
          spec:
            containers:
            - image: nginxinc/nginx-unprivileged:1.27
              name: leader
              resources: {}
        restartPolicy: RecreateGroupOnPodRestart
        size: 3
        workerTemplate:
          metadata: {}
          spec:
            containers:
            - image: nginxinc/nginx-unprivileged:1.27
              name: worker
              ports:
              - containerPort: 8080
                protocol: TCP
              resources: {}
      networkConfig:
        subdomainPolicy: Shared
      replicas: 2
      rolloutStrategy:
        rollingUpdateConfiguration:
          maxSurge: 1
          maxUnavailable: 1
        type: RollingUpdate
      startupPolicy: LeaderCreated
    Copy to Clipboard Toggle word wrap

    where:

    metadata.name
    Specifies the name of the leader worker set resource.
    metadata.namespace
    Specifies the namespace for the leader worker set to run in.
    spec.leaderWorkerTemplate.leaderTemplate
    Specifies the pod template for the leader pods.
    spec.leaderWorkerTemplate.restartPolicy
    Specifies the restart policy for when pod failures occur. Allowed values are RecreateGroupOnPodRestart to restart the whole group or None to not restart the group.
    spec.leaderWorkerTemplate.size
    Specifies the number of pods to create for each group, including the leader pod. For example, a value of 3 creates 1 leader pod and 2 worker pods. The default value is 1.
    spec.leaderWorkerTemplate.workerTemplate
    Specifies the pod template for the worker pods.
    spec.networkConfig.subdomainPolicy
    Specifies the policy to use when creating the headless service. Allowed values are UniquePerReplica or Shared. The default value is Shared.
    spec.replicas
    Specifies the number of replicas, or leader-worker groups. The default value is 1.
    spec.rolloutStrategy.rollingUpdateConfiguration.maxSurge
    Specifies the maximum number of replicas that can be scheduled above the replicas value during rolling updates. The value can be specified as an integer or a percentage.

    For more information on all available fields to configure, see LeaderWorkerSet API upstream documentation.

  3. Apply the leader worker set configuration by running the following command:

    $ oc apply -f leader-worker-set.yaml
    Copy to Clipboard Toggle word wrap

Verification

  1. Verify that pods were created by running the following command:

    $ oc get pods -n my-namespace
    Copy to Clipboard Toggle word wrap

    Example output

    NAME         READY   STATUS    RESTARTS   AGE
    my-lws-0     1/1     Running   0          4s
    my-lws-0-1   1/1     Running   0          3s
    my-lws-0-2   1/1     Running   0          3s
    my-lws-1     1/1     Running   0          7s
    my-lws-1-1   1/1     Running   0          6s
    my-lws-1-2   1/1     Running   0          6s
    Copy to Clipboard Toggle word wrap

    • my-lws-0 is the leader pod for the first group.
    • my-lws-1 is the leader pod for the second group.
  2. Review the stateful sets by running the following command:

    $ oc get statefulsets
    Copy to Clipboard Toggle word wrap

    Example output

    NAME       READY   AGE
    my-lws     4/4     111s
    my-lws-0   2/2     57s
    my-lws-1   2/2     60s
    Copy to Clipboard Toggle word wrap

    • my-lws is the leader stateful set for all leader-worker groups.
    • my-lws-0 is the worker stateful set for the first group.
    • my-lws-1 is the worker stateful set for the second group.

3.4. Uninstalling the Leader Worker Set Operator

If you no longer need the Leader Worker Set Operator in your cluster, you can uninstall the Operator and remove its related resources.

3.4.1. Uninstalling the Leader Worker Set Operator

You can use the web console to uninstall the Leader Worker Set Operator if you no longer need the Operator in your cluster.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.
  • You have installed the Leader Worker Set Operator.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Navigate to Operators Installed Operators.
  3. Select openshift-lws-operator from the Project dropdown list.
  4. Delete the LeaderWorkerSetOperator instance.

    1. Click Leader Worker Set Operator and select the LeaderWorkerSetOperator tab.
    2. Click the Options menu kebab next to the cluster entry and select Delete LeaderWorkerSetOperator.
    3. In the confirmation dialog, click Delete.
  5. Uninstall the Leader Worker Set Operator.

    1. Navigate to Operators Installed Operators.
    2. Click the Options menu kebab next to the Leader Worker Set Operator entry and click Uninstall Operator.
    3. In the confirmation dialog, click Uninstall.

3.4.2. Uninstalling Leader Worker Set Operator resources

Optionally, remove custom resources (CRs) and the associated namespace after the Leader Worker Set Operator is uninstalled. This cleans up all remaining Leader Worker Set artifacts.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.
  • You have uninstalled the Leader Worker Set Operator.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Remove CRDs that were created when the Leader Worker Set Operator was installed:

    1. Navigate to Administration CustomResourceDefinitions.
    2. Enter LeaderWorkerSetOperator in the Name field to filter the CRDs.
    3. Click the Options menu kebab next to the LeaderWorkerSetOperator CRD and select Delete CustomResourceDefinition.
    4. In the confirmation dialog, click Delete.
  3. Delete the openshift-lws-operator namespace.

    1. Navigate to Administration Namespaces.
    2. Enter openshift-lws-operator into the filter box.
    3. Click the Options menu kebab next to the openshift-lws-operator entry and select Delete Namespace.
    4. In the confirmation dialog, enter openshift-lws-operator and click Delete.
Torna in cima
Red Hat logoGithubredditYoutubeTwitter

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Aiutiamo gli utenti Red Hat a innovarsi e raggiungere i propri obiettivi con i nostri prodotti e servizi grazie a contenuti di cui possono fidarsi. Esplora i nostri ultimi aggiornamenti.

Rendiamo l’open source più inclusivo

Red Hat si impegna a sostituire il linguaggio problematico nel codice, nella documentazione e nelle proprietà web. Per maggiori dettagli, visita il Blog di Red Hat.

Informazioni su Red Hat

Forniamo soluzioni consolidate che rendono più semplice per le aziende lavorare su piattaforme e ambienti diversi, dal datacenter centrale all'edge della rete.

Theme

© 2025 Red Hat