Rechercher

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 28. Finding information on Kafka restarts

download PDF

After the Cluster Operator restarts a Kafka pod in an OpenShift cluster, it emits an OpenShift event into the pod’s namespace explaining why the pod restarted. For help in understanding cluster behavior, you can check restart events from the command line.

Tip

You can export and monitor restart events using metrics collection tools like Prometheus. Use the metrics tool with an event exporter that can export the output in a suitable format.

28.1. Reasons for a restart event

The Cluster Operator initiates a restart event for a specific reason. You can check the reason by fetching information on the restart event.

Table 28.1. Restart reasons
EventDescription

CaCertHasOldGeneration

The pod is still using a server certificate signed with an old CA, so needs to be restarted as part of the certificate update.

CaCertRemoved

Expired CA certificates have been removed, and the pod is restarted to run with the current certificates.

CaCertRenewed

CA certificates have been renewed, and the pod is restarted to run with the updated certificates.

ClientCaCertKeyReplaced

The key used to sign clients CA certificates has been replaced, and the pod is being restarted as part of the CA renewal process.

ClusterCaCertKeyReplaced

The key used to sign the cluster’s CA certificates has been replaced, and the pod is being restarted as part of the CA renewal process.

ConfigChangeRequiresRestart

Some Kafka configuration properties are changed dynamically, but others require that the broker be restarted.

FileSystemResizeNeeded

The file system size has been increased, and a restart is needed to apply it.

KafkaCertificatesChanged

One or more TLS certificates used by the Kafka broker have been updated, and a restart is needed to use them.

ManualRollingUpdate

A user annotated the pod, or the StrimziPodSet set it belongs to, to trigger a restart.

PodForceRestartOnError

An error occurred that requires a pod restart to rectify.

PodHasOldRevision

A disk was added or removed from the Kafka volumes, and a restart is needed to apply the change. When using StrimziPodSet resources, the same reason is given if the pod needs to be recreated.

PodHasOldRevision

The StrimziPodSet that the pod is a member of has been updated, so the pod needs to be recreated. When using StrimziPodSet resources, the same reason is given if a disk was added or removed from the Kafka volumes.

PodStuck

The pod is still pending, and is not scheduled or cannot be scheduled, so the operator has restarted the pod in a final attempt to get it running.

PodUnresponsive

Streams for Apache Kafka was unable to connect to the pod, which can indicate a broker not starting correctly, so the operator restarted it in an attempt to resolve the issue.

28.2. Restart event filters

When checking restart events from the command line, you can specify a field-selector to filter on OpenShift event fields.

The following fields are available when filtering events with field-selector.

regardingObject.kind
The object that was restarted, and for restart events, the kind is always Pod.
regarding.namespace
The namespace that the pod belongs to.
regardingObject.name
The pod’s name, for example, strimzi-cluster-kafka-0.
regardingObject.uid
The unique ID of the pod.
reason
The reason the pod was restarted, for example, JbodVolumesChanged.
reportingController
The reporting component is always strimzi.io/cluster-operator for Streams for Apache Kafka restart events.
source
source is an older version of reportingController. The reporting component is always strimzi.io/cluster-operator for Streams for Apache Kafka restart events.
type
The event type, which is either Warning or Normal. For Streams for Apache Kafka restart events, the type is Normal.
Note

In older versions of OpenShift, the fields using the regarding prefix might use an involvedObject prefix instead. reportingController was previously called reportingComponent.

28.3. Checking Kafka restarts

Use a oc command to list restart events initiated by the Cluster Operator. Filter restart events emitted by the Cluster Operator by setting the Cluster Operator as the reporting component using the reportingController or source event fields.

Prerequisites

  • The Cluster Operator is running in the OpenShift cluster.

Procedure

  1. Get all restart events emitted by the Cluster Operator:

    oc -n kafka get events --field-selector reportingController=strimzi.io/cluster-operator

    Example showing events returned

    LAST SEEN   TYPE     REASON                   OBJECT                        MESSAGE
    2m          Normal   CaCertRenewed            pod/strimzi-cluster-kafka-0   CA certificate renewed
    58m         Normal   PodForceRestartOnError   pod/strimzi-cluster-kafka-1   Pod needs to be forcibly restarted due to an error
    5m47s       Normal   ManualRollingUpdate      pod/strimzi-cluster-kafka-2   Pod was manually annotated to be rolled

    You can also specify a reason or other field-selector options to constrain the events returned.

    Here, a specific reason is added:

    oc -n kafka get events --field-selector reportingController=strimzi.io/cluster-operator,reason=PodForceRestartOnError
  2. Use an output format, such as YAML, to return more detailed information about one or more events.

    oc -n kafka get events --field-selector reportingController=strimzi.io/cluster-operator,reason=PodForceRestartOnError -o yaml

    Example showing detailed events output

    apiVersion: v1
    items:
    - action: StrimziInitiatedPodRestart
      apiVersion: v1
      eventTime: "2022-05-13T00:22:34.168086Z"
      firstTimestamp: null
      involvedObject:
          kind: Pod
          name: strimzi-cluster-kafka-1
          namespace: kafka
      kind: Event
      lastTimestamp: null
      message: Pod needs to be forcibly restarted due to an error
      metadata:
          creationTimestamp: "2022-05-13T00:22:34Z"
          generateName: strimzi-event
          name: strimzi-eventwppk6
          namespace: kafka
          resourceVersion: "432961"
          uid: 29fcdb9e-f2cf-4c95-a165-a5efcd48edfc
      reason: PodForceRestartOnError
      reportingController: strimzi.io/cluster-operator
      reportingInstance: strimzi-cluster-operator-6458cfb4c6-6bpdp
      source: {}
      type: Normal
    kind: List
    metadata:
      resourceVersion: ""
      selfLink: ""

The following fields are deprecated, so they are not populated for these events:

  • firstTimestamp
  • lastTimestamp
  • source
Red Hat logoGithubRedditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez leBlog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

© 2024 Red Hat, Inc.