Este conteúdo não está disponível no idioma selecionado.
Chapter 27. Evicting pods with the Streams for Apache Kafka Drain Cleaner
Kafka pods might be evicted during OpenShift upgrades, maintenance, or pod rescheduling. If your Kafka pods were deployed by Streams for Apache Kafka, you can use the Streams for Apache Kafka Drain Cleaner tool to handle the pod evictions. The Streams for Apache Kafka Drain Cleaner handles the eviction instead of OpenShift.
By deploying the Streams for Apache Kafka Drain Cleaner, you can use the Cluster Operator to move Kafka pods instead of OpenShift. The Cluster Operator ensures that the number of in sync replicas for topics are at or above the configured min.insync.replicas and Kafka can remain operational during the eviction process. The Cluster Operator waits for topics to synchronize, as the OpenShift worker nodes drain consecutively.
An admission webhook notifies the Streams for Apache Kafka Drain Cleaner of pod eviction requests to the Kubernetes API. The Streams for Apache Kafka Drain Cleaner then adds a rolling update annotation to the pods to be drained. This informs the Cluster Operator to perform a rolling update of an evicted pod.
If you are not using the Streams for Apache Kafka Drain Cleaner, you can add pod annotations to perform rolling updates manually.
27.1. Default webhook configuration Copiar o linkLink copiado para a área de transferência!
The Strimzi Drain Cleaner deployment includes a ValidatingWebhookConfiguration resource that registers the webhook with the Kubernetes API:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
# ...
webhooks:
- name: strimzi-drain-cleaner.strimzi.io
rules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE"]
resources: ["pods/eviction"]
scope: "Namespaced"
clientConfig:
service:
namespace: "strimzi-drain-cleaner"
name: "strimzi-drain-cleaner"
path: /drainer
port: 443
caBundle: Cg==
# ...
Unless you are using your own TLS certificates, no manual configuration is required.
The webhook intercepts pod eviction requests based on the rules defined in the configuration. Only CREATE operations targeting the pods/eviction sub-resource are evaluated. When these conditions are met, the API forwards the request to the webhook.
The clientConfig section specifies the target service and endpoint for the webhook. The webhook listens on the /drainer path and requires a secure TLS connection.
The caBundle property provides the Base64-encoded certificate chain used to validate HTTPS communication. By default, the TLS certificates are generated and injected into the configuration automatically. If you supply your own TLS certificates, you must manually update the caBundle value.
27.2. Deploying the Streams for Apache Kafka Drain Cleaner using installation files Copiar o linkLink copiado para a área de transferência!
Deploy the Streams for Apache Kafka Drain Cleaner to the OpenShift cluster where the Cluster Operator and Kafka cluster are running.
Streams for Apache Kafka Drain Cleaner can run in two different modes. By default, the Drain Cleaner denies (blocks) the OpenShift eviction request to prevent OpenShift from evicting the pods and instead uses the Cluster Operator to move the pod. This mode has better compatibility with various cluster autoscaling tools and does not require any specific PodDisuptionBudget configuration. Alternatively, you can enable the legacy mode where it allows the eviction request while also instructing the Cluster Operator to move the pod. For the legacy mode to work, you have to configure the PodDisruptionBudget to not allow any pod evictions by setting the maxUnavailable option to 0.
Prerequisites
- The Drain Cleaner deployment files, which are included in the Streams for Apache Kafka deployment files.
- You have a highly available Kafka cluster deployment running with OpenShift worker nodes that you would like to update.
Topics are replicated for high availability.
Topic configuration specifies a replication factor of at least 3 and a minimum number of in-sync replicas to 1 less than the replication factor.
Kafka topic replicated for high availability
apiVersion: kafka.strimzi.io/v1beta2 kind: KafkaTopic metadata: name: my-topic labels: strimzi.io/cluster: my-cluster spec: partitions: 1 replicas: 3 config: # ... min.insync.replicas: 2 # ...
Using Drain Cleaner in legacy mode
To use the Drain Cleaner in legacy mode, change the default environment variables in the Drain Cleaner Deployment configuration file:
-
Set
STRIMZI_DENY_EVICTIONtofalseto use the legacy mode relying on thePodDisruptionBudgetconfiguration.
Example configuration to use legacy mode
apiVersion: apps/v1
kind: Deployment
spec:
# ...
template:
spec:
serviceAccountName: strimzi-drain-cleaner
containers:
- name: strimzi-drain-cleaner
# ...
env:
- name: STRIMZI_DENY_EVICTION
value: "false"
- name: STRIMZI_DRAIN_KAFKA
value: "true"
# ...
Procedure
If you are using the legacy mode activated by setting the
STRIMZI_DENY_EVICTIONenvironment variable tofalse, you must also configure thePodDisruptionBudgetresource. SetmaxUnavailableto0(zero) in the Kafka section of theKafkaresource usingtemplatesettings.Specifying a pod disruption budget
apiVersion: kafka.strimzi.io/v1beta2 kind: Kafka metadata: name: my-cluster annotations: strimzi.io/node-pools: enabled strimzi.io/kraft: enabled namespace: myproject spec: kafka: template: podDisruptionBudget: maxUnavailable: 0 # ...This setting prevents the automatic eviction of pods in case of planned disruptions, leaving the Streams for Apache Kafka Drain Cleaner and Cluster Operator to roll the pods on different worker nodes.
Update the
Kafkaresource:oc apply -f <kafka_configuration_file>Deploy the Streams for Apache Kafka Drain Cleaner.
To run the Drain Cleaner on OpenShift, apply the resources in the
/install/drain-cleaner/openshiftdirectory.oc apply -f ./install/drain-cleaner/openshift
27.3. Using the Streams for Apache Kafka Drain Cleaner Copiar o linkLink copiado para a área de transferência!
Use the Streams for Apache Kafka Drain Cleaner in combination with the Cluster Operator to move Kafka broker pods from nodes that are being drained. When you run the Streams for Apache Kafka Drain Cleaner, it annotates pods with a rolling update pod annotation. The Cluster Operator performs rolling updates based on the annotation.
Prerequisites
Considerations when using anti-affinity configuration
When using anti-affinity with your Kafka pods, consider adding a spare worker node to your cluster. Including spare nodes ensures that your cluster has the capacity to reschedule pods during node draining or temporary unavailability of other nodes. When a worker node is drained, and anti-affinity rules restrict pod rescheduling on alternative nodes, spare nodes help prevent restarted pods from becoming unschedulable. This mitigates the risk of the draining operation failing.
Procedure
Drain a specified OpenShift node hosting the Kafka broker.
oc get nodes oc drain <name_of_node> --delete-emptydir-data --ignore-daemonsets --timeout=6000s --forceCheck the eviction events in the Streams for Apache Kafka Drain Cleaner log to verify that the pods have been annotated for restart.
Streams for Apache Kafka Drain Cleaner log show annotations of pods
INFO ... Received eviction webhook for Pod my-cluster-kafka-0 in namespace my-project INFO ... Pod my-cluster-kafka-0 in namespace my-project will be annotated for restart INFO ... Pod my-cluster-kafka-0 in namespace my-project found and annotated for restartCheck the reconciliation events in the Cluster Operator log to verify the rolling updates.
Cluster Operator log shows rolling updates
INFO PodOperator:68 - Reconciliation #13(timer) Kafka(my-project/my-cluster): Rolling Pod my-cluster-kafka-0 INFO AbstractOperator:500 - Reconciliation #13(timer) Kafka(my-project/my-cluster): reconciled
27.4. Watching the TLS certificates used by the Streams for Apache Kafka Drain Cleaner Copiar o linkLink copiado para a área de transferência!
By default, the Drain Cleaner deployment watches the secret containing the TLS certificates its uses for authentication. The Drain Cleaner watches for changes, such as certificate renewals. If it detects a change, it restarts to reload the TLS certificates. The Drain Cleaner installation files enable this behavior by default. But you can disable the watching of certificates by setting the STRIMZI_CERTIFICATE_WATCH_ENABLED environment variable to false in the Deployment configuration (060-Deployment.yaml) of the Drain Cleaner installation files.
With STRIMZI_CERTIFICATE_WATCH_ENABLED enabled, you can also use the following environment variables for watching TLS certificates.
| Environment Variable | Description | Default |
|---|---|---|
|
| Enables or disables the certificate watch |
|
|
| The namespace where the Drain Cleaner is deployed and where the certificate secret exists |
|
|
| The Drain Cleaner pod name | - |
|
| The name of the secret containing TLS certificates |
|
|
| The list of fields inside the secret that contain the TLS certificates |
|
Example environment variable configuration to control watch operations
apiVersion: apps/v1
kind: Deployment
metadata:
name: strimzi-drain-cleaner
labels:
app: strimzi-drain-cleaner
namespace: strimzi-drain-cleaner
spec:
# ...
spec:
serviceAccountName: strimzi-drain-cleaner
containers:
- name: strimzi-drain-cleaner
# ...
env:
- name: STRIMZI_DRAIN_KAFKA
value: "true"
- name: STRIMZI_CERTIFICATE_WATCH_ENABLED
value: "true"
- name: STRIMZI_CERTIFICATE_WATCH_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: STRIMZI_CERTIFICATE_WATCH_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# ...
Use the Downward API (as described in the Openshift Nodes guide) mechanism to configure STRIMZI_CERTIFICATE_WATCH_NAMESPACE and STRIMZI_CERTIFICATE_WATCH_POD_NAME.