Este conteúdo não está disponível no idioma selecionado.

Chapter 27. Evicting pods with the Streams for Apache Kafka Drain Cleaner


Kafka pods might be evicted during OpenShift upgrades, maintenance, or pod rescheduling. If your Kafka pods were deployed by Streams for Apache Kafka, you can use the Streams for Apache Kafka Drain Cleaner tool to handle the pod evictions. The Streams for Apache Kafka Drain Cleaner handles the eviction instead of OpenShift.

By deploying the Streams for Apache Kafka Drain Cleaner, you can use the Cluster Operator to move Kafka pods instead of OpenShift. The Cluster Operator ensures that the number of in sync replicas for topics are at or above the configured min.insync.replicas and Kafka can remain operational during the eviction process. The Cluster Operator waits for topics to synchronize, as the OpenShift worker nodes drain consecutively.

An admission webhook notifies the Streams for Apache Kafka Drain Cleaner of pod eviction requests to the Kubernetes API. The Streams for Apache Kafka Drain Cleaner then adds a rolling update annotation to the pods to be drained. This informs the Cluster Operator to perform a rolling update of an evicted pod.

Note

If you are not using the Streams for Apache Kafka Drain Cleaner, you can add pod annotations to perform rolling updates manually.

27.1. Default webhook configuration

The Strimzi Drain Cleaner deployment includes a ValidatingWebhookConfiguration resource that registers the webhook with the Kubernetes API:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
# ...
webhooks:
  - name: strimzi-drain-cleaner.strimzi.io
    rules:
      - apiGroups:   [""]
        apiVersions: ["v1"]
        operations:  ["CREATE"]
        resources:   ["pods/eviction"]
        scope:       "Namespaced"
    clientConfig:
      service:
        namespace: "strimzi-drain-cleaner"
        name: "strimzi-drain-cleaner"
        path: /drainer
        port: 443
        caBundle: Cg==
    # ...

Unless you are using your own TLS certificates, no manual configuration is required.

The webhook intercepts pod eviction requests based on the rules defined in the configuration. Only CREATE operations targeting the pods/eviction sub-resource are evaluated. When these conditions are met, the API forwards the request to the webhook.

The clientConfig section specifies the target service and endpoint for the webhook. The webhook listens on the /drainer path and requires a secure TLS connection.

The caBundle property provides the Base64-encoded certificate chain used to validate HTTPS communication. By default, the TLS certificates are generated and injected into the configuration automatically. If you supply your own TLS certificates, you must manually update the caBundle value.

27.2. Deploying the Streams for Apache Kafka Drain Cleaner using installation files

Deploy the Streams for Apache Kafka Drain Cleaner to the OpenShift cluster where the Cluster Operator and Kafka cluster are running.

Streams for Apache Kafka Drain Cleaner can run in two different modes. By default, the Drain Cleaner denies (blocks) the OpenShift eviction request to prevent OpenShift from evicting the pods and instead uses the Cluster Operator to move the pod. This mode has better compatibility with various cluster autoscaling tools and does not require any specific PodDisuptionBudget configuration. Alternatively, you can enable the legacy mode where it allows the eviction request while also instructing the Cluster Operator to move the pod. For the legacy mode to work, you have to configure the PodDisruptionBudget to not allow any pod evictions by setting the maxUnavailable option to 0.

Prerequisites

  • The Drain Cleaner deployment files, which are included in the Streams for Apache Kafka deployment files.
  • You have a highly available Kafka cluster deployment running with OpenShift worker nodes that you would like to update.
  • Topics are replicated for high availability.

    Topic configuration specifies a replication factor of at least 3 and a minimum number of in-sync replicas to 1 less than the replication factor.

    Kafka topic replicated for high availability

    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaTopic
    metadata:
      name: my-topic
      labels:
        strimzi.io/cluster: my-cluster
    spec:
      partitions: 1
      replicas: 3
      config:
        # ...
        min.insync.replicas: 2
        # ...

Using Drain Cleaner in legacy mode

To use the Drain Cleaner in legacy mode, change the default environment variables in the Drain Cleaner Deployment configuration file:

  • Set STRIMZI_DENY_EVICTION to false to use the legacy mode relying on the PodDisruptionBudget configuration.

Example configuration to use legacy mode

apiVersion: apps/v1
kind: Deployment
spec:
  # ...
  template:
    spec:
      serviceAccountName: strimzi-drain-cleaner
      containers:
        - name: strimzi-drain-cleaner
          # ...
          env:
            - name: STRIMZI_DENY_EVICTION
              value: "false"
            - name: STRIMZI_DRAIN_KAFKA
              value: "true"
            # ...

Procedure

  1. If you are using the legacy mode activated by setting the STRIMZI_DENY_EVICTION environment variable to false, you must also configure the PodDisruptionBudget resource. Set maxUnavailable to 0 (zero) in the Kafka section of the Kafka resource using template settings.

    Specifying a pod disruption budget

    apiVersion: kafka.strimzi.io/v1beta2
    kind: Kafka
    metadata:
      name: my-cluster
      annotations:
        strimzi.io/node-pools: enabled
        strimzi.io/kraft: enabled
      namespace: myproject
    spec:
      kafka:
        template:
          podDisruptionBudget:
            maxUnavailable: 0
      # ...

    This setting prevents the automatic eviction of pods in case of planned disruptions, leaving the Streams for Apache Kafka Drain Cleaner and Cluster Operator to roll the pods on different worker nodes.

  2. Update the Kafka resource:

    oc apply -f <kafka_configuration_file>
  3. Deploy the Streams for Apache Kafka Drain Cleaner.

    • To run the Drain Cleaner on OpenShift, apply the resources in the /install/drain-cleaner/openshift directory.

      oc apply -f ./install/drain-cleaner/openshift

27.3. Using the Streams for Apache Kafka Drain Cleaner

Use the Streams for Apache Kafka Drain Cleaner in combination with the Cluster Operator to move Kafka broker pods from nodes that are being drained. When you run the Streams for Apache Kafka Drain Cleaner, it annotates pods with a rolling update pod annotation. The Cluster Operator performs rolling updates based on the annotation.

Considerations when using anti-affinity configuration

When using anti-affinity with your Kafka pods, consider adding a spare worker node to your cluster. Including spare nodes ensures that your cluster has the capacity to reschedule pods during node draining or temporary unavailability of other nodes. When a worker node is drained, and anti-affinity rules restrict pod rescheduling on alternative nodes, spare nodes help prevent restarted pods from becoming unschedulable. This mitigates the risk of the draining operation failing.

Procedure

  1. Drain a specified OpenShift node hosting the Kafka broker.

    oc get nodes
    oc drain <name_of_node> --delete-emptydir-data --ignore-daemonsets --timeout=6000s --force
  2. Check the eviction events in the Streams for Apache Kafka Drain Cleaner log to verify that the pods have been annotated for restart.

    Streams for Apache Kafka Drain Cleaner log show annotations of pods

    INFO ... Received eviction webhook for Pod my-cluster-kafka-0 in namespace my-project
    INFO ... Pod my-cluster-kafka-0 in namespace my-project will be annotated for restart
    INFO ... Pod my-cluster-kafka-0 in namespace my-project found and annotated for restart

  3. Check the reconciliation events in the Cluster Operator log to verify the rolling updates.

    Cluster Operator log shows rolling updates

    INFO  PodOperator:68 - Reconciliation #13(timer) Kafka(my-project/my-cluster): Rolling Pod my-cluster-kafka-0
    INFO  AbstractOperator:500 - Reconciliation #13(timer) Kafka(my-project/my-cluster): reconciled

By default, the Drain Cleaner deployment watches the secret containing the TLS certificates its uses for authentication. The Drain Cleaner watches for changes, such as certificate renewals. If it detects a change, it restarts to reload the TLS certificates. The Drain Cleaner installation files enable this behavior by default. But you can disable the watching of certificates by setting the STRIMZI_CERTIFICATE_WATCH_ENABLED environment variable to false in the Deployment configuration (060-Deployment.yaml) of the Drain Cleaner installation files.

With STRIMZI_CERTIFICATE_WATCH_ENABLED enabled, you can also use the following environment variables for watching TLS certificates.

Expand
Table 27.1. Drain Cleaner environment variables for watching TLS certificates
Environment VariableDescriptionDefault

STRIMZI_CERTIFICATE_WATCH_ENABLED

Enables or disables the certificate watch

false

STRIMZI_CERTIFICATE_WATCH_NAMESPACE

The namespace where the Drain Cleaner is deployed and where the certificate secret exists

strimzi-drain-cleaner

STRIMZI_CERTIFICATE_WATCH_POD_NAME

The Drain Cleaner pod name

-

STRIMZI_CERTIFICATE_WATCH_SECRET_NAME

The name of the secret containing TLS certificates

strimzi-drain-cleaner

STRIMZI_CERTIFICATE_WATCH_SECRET_KEYS

The list of fields inside the secret that contain the TLS certificates

tls.crt, tls.key

Example environment variable configuration to control watch operations

apiVersion: apps/v1
kind: Deployment
metadata:
  name: strimzi-drain-cleaner
  labels:
    app: strimzi-drain-cleaner
  namespace: strimzi-drain-cleaner
spec:
  # ...
    spec:
      serviceAccountName: strimzi-drain-cleaner
      containers:
        - name: strimzi-drain-cleaner
          # ...
          env:
            - name: STRIMZI_DRAIN_KAFKA
              value: "true"
            - name: STRIMZI_CERTIFICATE_WATCH_ENABLED
              value: "true"
            - name: STRIMZI_CERTIFICATE_WATCH_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: STRIMZI_CERTIFICATE_WATCH_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
              # ...

Tip

Use the Downward API (as described in the Openshift Nodes guide) mechanism to configure STRIMZI_CERTIFICATE_WATCH_NAMESPACE and STRIMZI_CERTIFICATE_WATCH_POD_NAME.

Red Hat logoGithubredditYoutubeTwitter

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Ajudamos os usuários da Red Hat a inovar e atingir seus objetivos com nossos produtos e serviços com conteúdo em que podem confiar. Explore nossas atualizações recentes.

Tornando o open source mais inclusivo

A Red Hat está comprometida em substituir a linguagem problemática em nosso código, documentação e propriedades da web. Para mais detalhes veja o Blog da Red Hat.

Sobre a Red Hat

Fornecemos soluções robustas que facilitam o trabalho das empresas em plataformas e ambientes, desde o data center principal até a borda da rede.

Theme

© 2026 Red Hat
Voltar ao topo