Rechercher

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 23. Evicting pods with the Streams for Apache Kafka Drain Cleaner

download PDF

Kafka and ZooKeeper pods might be evicted during OpenShift upgrades, maintenance, or pod rescheduling. If your Kafka and ZooKeeper pods were deployed by Streams for Apache Kafka, you can use the Streams for Apache Kafka Drain Cleaner tool to handle the pod evictions. The Streams for Apache Kafka Drain Cleaner handles the eviction instead of OpenShift.

By deploying the Streams for Apache Kafka Drain Cleaner, you can use the Cluster Operator to move Kafka pods instead of OpenShift. The Cluster Operator ensures that topics are never under-replicated and Kafka can remain operational during the eviction process. The Cluster Operator waits for topics to synchronize, as the OpenShift worker nodes drain consecutively.

An admission webhook notifies the Streams for Apache Kafka Drain Cleaner of pod eviction requests to the Kubernetes API. The Streams for Apache Kafka Drain Cleaner then adds a rolling update annotation to the pods to be drained. This informs the Cluster Operator to perform a rolling update of an evicted pod.

Note

If you are not using the Streams for Apache Kafka Drain Cleaner, you can add pod annotations to perform rolling updates manually.

Webhook configuration

The Streams for Apache Kafka Drain Cleaner deployment files include a ValidatingWebhookConfiguration resource file. The resource provides the configuration for registering the webhook with the Kubernetes API.

The configuration defines the rules for the Kubernetes API to follow in the event of a pod eviction request. The rules specify that only CREATE operations related to pods/eviction sub-resources are intercepted. If these rules are met, the API forwards the notification.

The clientConfig points to the Streams for Apache Kafka Drain Cleaner service and /drainer endpoint that exposes the webhook. The webhook uses a secure TLS connection, which requires authentication. The caBundle property specifies the certificate chain to validate HTTPS communication. Certificates are encoded in Base64.

Webhook configuration for pod eviction notifications

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
# ...
webhooks:
  - name: strimzi-drain-cleaner.strimzi.io
    rules:
      - apiGroups:   [""]
        apiVersions: ["v1"]
        operations:  ["CREATE"]
        resources:   ["pods/eviction"]
        scope:       "Namespaced"
    clientConfig:
      service:
        namespace: "strimzi-drain-cleaner"
        name: "strimzi-drain-cleaner"
        path: /drainer
        port: 443
        caBundle: Cg==
    # ...

23.1. Downloading the Streams for Apache Kafka Drain Cleaner deployment files

To deploy and use the Streams for Apache Kafka Drain Cleaner, you need to download the deployment files.

The Streams for Apache Kafka Drain Cleaner deployment files are available from the Streams for Apache Kafka software downloads page.

23.2. Deploying the Streams for Apache Kafka Drain Cleaner using installation files

Deploy the Streams for Apache Kafka Drain Cleaner to the OpenShift cluster where the Cluster Operator and Kafka cluster are running.

Streams for Apache Kafka Drain Cleaner can run in two different modes. By default, the Drain Cleaner denies (blocks) the OpenShift eviction request to prevent OpenShift from evicting the pods and instead uses the Cluster Operator to move the pod. This mode has better compatibility with various cluster autoscaling tools and does not require any specific PodDisuptionBudget configuration. Alternatively, you can enable the legacy mode where it allows the eviction request while also instructing the Cluster Operator to move the pod. For the legacy mode to work, you have to configure the PodDisruptionBudget to not allow any pod evictions by setting the maxUnavailable option to 0.

Prerequisites

  • You have downloaded the Streams for Apache Kafka Drain Cleaner deployment files.
  • You have a highly available Kafka cluster deployment running with OpenShift worker nodes that you would like to update.
  • Topics are replicated for high availability.

    Topic configuration specifies a replication factor of at least 3 and a minimum number of in-sync replicas to 1 less than the replication factor.

    Kafka topic replicated for high availability

    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaTopic
    metadata:
      name: my-topic
      labels:
        strimzi.io/cluster: my-cluster
    spec:
      partitions: 1
      replicas: 3
      config:
        # ...
        min.insync.replicas: 2
        # ...

Excluding Kafka or ZooKeeper

If you don’t want to include Kafka or ZooKeeper pods in Drain Cleaner operations, or if you prefer to use the Drain Cleaner in legacy mode, change the default environment variables in the Drain Cleaner Deployment configuration file:

  • Set STRIMZI_DENY_EVICTION to false to use the legacy mode relying on the PodDisruptionBudget configuration
  • Set STRIMZI_DRAIN_KAFKA to false to exclude Kafka pods
  • Set STRIMZI_DRAIN_ZOOKEEPER to false to exclude ZooKeeper pods

Example configuration to exclude ZooKeeper pods

apiVersion: apps/v1
kind: Deployment
spec:
  # ...
  template:
    spec:
      serviceAccountName: strimzi-drain-cleaner
      containers:
        - name: strimzi-drain-cleaner
          # ...
          env:
            - name: STRIMZI_DENY_EVICTION
              value: "true"
            - name: STRIMZI_DRAIN_KAFKA
              value: "true"
            - name: STRIMZI_DRAIN_ZOOKEEPER
              value: "false"
          # ...

Procedure

  1. If you are using the legacy mode activated by setting the STRIMZI_DENY_EVICTION environment variable to false, you must also configure the PodDisruptionBudget resource. Set maxUnavailable to 0 (zero) in the Kafka and ZooKeeper sections of the Kafka resource using template settings.

    Specifying a pod disruption budget

    apiVersion: kafka.strimzi.io/v1beta2
    kind: Kafka
    metadata:
      name: my-cluster
      namespace: myproject
    spec:
      kafka:
        template:
          podDisruptionBudget:
            maxUnavailable: 0
    
      # ...
      zookeeper:
        template:
          podDisruptionBudget:
            maxUnavailable: 0
      # ...

    This setting prevents the automatic eviction of pods in case of planned disruptions, leaving the Streams for Apache Kafka Drain Cleaner and Cluster Operator to roll the pods on different worker nodes.

    Add the same configuration for ZooKeeper if you want to use Streams for Apache Kafka Drain Cleaner to drain ZooKeeper nodes.

  2. Update the Kafka resource:

    oc apply -f <kafka_configuration_file>
  3. Deploy the Streams for Apache Kafka Drain Cleaner.

    • To run the Drain Cleaner on OpenShift, apply the resources in the /install/drain-cleaner/openshift directory.

      oc apply -f ./install/drain-cleaner/openshift

23.3. Using the Streams for Apache Kafka Drain Cleaner

Use the Streams for Apache Kafka Drain Cleaner in combination with the Cluster Operator to move Kafka broker or ZooKeeper pods from nodes that are being drained. When you run the Streams for Apache Kafka Drain Cleaner, it annotates pods with a rolling update pod annotation. The Cluster Operator performs rolling updates based on the annotation.

Considerations when using anti-affinity configuration

When using anti-affinity with your Kafka or ZooKeeper pods, consider adding a spare worker node to your cluster. Including spare nodes ensures that your cluster has the capacity to reschedule pods during node draining or temporary unavailability of other nodes. When a worker node is drained, and anti-affinity rules restrict pod rescheduling on alternative nodes, spare nodes help prevent restarted pods from becoming unschedulable. This mitigates the risk of the draining operation failing.

Procedure

  1. Drain a specified OpenShift node hosting the Kafka broker or ZooKeeper pods.

    oc get nodes
    oc drain <name-of-node> --delete-emptydir-data --ignore-daemonsets --timeout=6000s --force
  2. Check the eviction events in the Streams for Apache Kafka Drain Cleaner log to verify that the pods have been annotated for restart.

    Streams for Apache Kafka Drain Cleaner log show annotations of pods

    INFO ... Received eviction webhook for Pod my-cluster-zookeeper-2 in namespace my-project
    INFO ... Pod my-cluster-zookeeper-2 in namespace my-project will be annotated for restart
    INFO ... Pod my-cluster-zookeeper-2 in namespace my-project found and annotated for restart
    
    INFO ... Received eviction webhook for Pod my-cluster-kafka-0 in namespace my-project
    INFO ... Pod my-cluster-kafka-0 in namespace my-project will be annotated for restart
    INFO ... Pod my-cluster-kafka-0 in namespace my-project found and annotated for restart

  3. Check the reconciliation events in the Cluster Operator log to verify the rolling updates.

    Cluster Operator log shows rolling updates

    INFO  PodOperator:68 - Reconciliation #13(timer) Kafka(my-project/my-cluster): Rolling Pod my-cluster-zookeeper-2
    INFO  PodOperator:68 - Reconciliation #13(timer) Kafka(my-project/my-cluster): Rolling Pod my-cluster-kafka-0
    INFO  AbstractOperator:500 - Reconciliation #13(timer) Kafka(my-project/my-cluster): reconciled

23.4. Watching the TLS certificates used by the Streams for Apache Kafka Drain Cleaner

By default, the Drain Cleaner deployment watches the secret containing the TLS certificates its uses for authentication. The Drain Cleaner watches for changes, such as certificate renewals. If it detects a change, it restarts to reload the TLS certificates. The Drain Cleaner installation files enable this behavior by default. But you can disable the watching of certificates by setting the STRIMZI_CERTIFICATE_WATCH_ENABLED environment variable to false in the Deployment configuration (060-Deployment.yaml) of the Drain Cleaner installation files.

With STRIMZI_CERTIFICATE_WATCH_ENABLED enabled, you can also use the following environment variables for watching TLS certificates.

Table 23.1. Drain Cleaner environment variables for watching TLS certificates
Environment VariableDescriptionDefault

STRIMZI_CERTIFICATE_WATCH_ENABLED

Enables or disables the certificate watch

false

STRIMZI_CERTIFICATE_WATCH_NAMESPACE

The namespace where the Drain Cleaner is deployed and where the certificate secret exists

strimzi-drain-cleaner

STRIMZI_CERTIFICATE_WATCH_POD_NAME

The Drain Cleaner pod name

-

STRIMZI_CERTIFICATE_WATCH_SECRET_NAME

The name of the secret containing TLS certificates

strimzi-drain-cleaner

STRIMZI_CERTIFICATE_WATCH_SECRET_KEYS

The list of fields inside the secret that contain the TLS certificates

tls.crt, tls.key

Example environment variable configuration to control watch operations

apiVersion: apps/v1
kind: Deployment
metadata:
  name: strimzi-drain-cleaner
  labels:
    app: strimzi-drain-cleaner
  namespace: strimzi-drain-cleaner
spec:
  # ...
    spec:
      serviceAccountName: strimzi-drain-cleaner
      containers:
        - name: strimzi-drain-cleaner
          # ...
          env:
            - name: STRIMZI_DRAIN_KAFKA
              value: "true"
            - name: STRIMZI_DRAIN_ZOOKEEPER
              value: "true"
            - name: STRIMZI_CERTIFICATE_WATCH_ENABLED
              value: "true"
            - name: STRIMZI_CERTIFICATE_WATCH_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: STRIMZI_CERTIFICATE_WATCH_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
              # ...

Tip

Use the Downward API mechanism to configure STRIMZI_CERTIFICATE_WATCH_NAMESPACE and STRIMZI_CERTIFICATE_WATCH_POD_NAME.

Red Hat logoGithubRedditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez leBlog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

© 2024 Red Hat, Inc.