此内容没有您所选择的语言版本。
Chapter 8. Monitoring high availbility services
Red Hat OpenStack on OpenShift (RHOSO) high availability (HA) uses Red Hat OpenShift Container Platform (RHOCP) operations to orchestrate failover and recovery deployment. When you plan your deployment, ensure that you review the considerations for different aspects of the environment, such as hardware assignments and network configuration.
The following shared control plane services are required to implement HA:
- Galera Cluster
- RabbitMQ
- memcached
These services run as pods, and they are managed and monitored by RHOCP.
You can use the OpenShift client command line interface (“oc”) to interact with the platform and retrieve information about the status of the OpenStack control plane services.
You can use the OpenShift Client (oc) to complete the following actions:
- List the pods
- Learn more about the pods' configuration
- Retrieve information about the pods' runtime
8.1. RHOSO Galera clusters
RHOSO deploys the two following Galera clusters:
-
openstack
. This cluster hosts the databases for all OpenStack services. -
openstack-cell1
. This cluster hosts the databases specific to Nova cell.
Galera Custom Resources configures both clusters.
To retrieve more information about the Galera’s Custom Resources, use the oc get galera
command as shown in the following example:
$ oc get galera NAME READY MESSAGE openstack True Setup complete openstack-cell1 True Setup complete
The Message
and Ready
columns show the startup state and the service availability of the Galera CR. When the Ready
condition is True
, the pods are started and ready to accept traffic as shown in the following example:
$ oc get pod -l galera/name=openstack NAME READY STATUS RESTARTS AGE openstack-galera-0 1/1 Running 0 4h22m openstack-galera-1 1/1 Running 0 4h22m openstack-galera-2 1/1 Running 0 4h22m
The mariadb operator performs the following Galera cluster operations:
- Creates the pods that host the mysqld servers.
- Runs the logic for bootstrapping a Galera cluster. For example, the mariadb operator starts the cluster using the most recent copy of the Galera database.
- Monitors the running Galera pods.
- Restarts the pods when the pods fail the healthcheck.
To expose the database service, the mariadb operator creates an OpenShift service object called openstack
. The OpenStack service object components access the database through the IP provided by the service:
$ oc get service -l mariadb/name=openstack NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE openstack ClusterIP 10.217.5.210 <none> 3306/TCP 7h
The incoming traffic is load-balanced to any available Galera pod. OpenShift marks a pod as available based on its Readiness healthcheck. If a pod misbehaves or if a pod begins to stop, it is removed from the service’s list of endpoints.
- To view a list of available endpoints, use the following command:
The mariadb operator creates a ‘headless’ service for the Galera pods. This service is a DNS service between Galera pods for internal Galera cluster communication.
8.1.1. Monitoring Galera startup
-
To monitor the startup of the Galera pods, use the
oc describe galera
command.
Galera CR’s status records the status of a Galera cluster’s startup. The CR’s conditions report the status of the prerequisites that Galera pods need to start as shown in the following example:
The Ready
condition is true only when all the other conditions are True
.
Status: Conditions: Last Transition Time: 2024-04-22T07:32:06Z Message: Setup complete Reason: Ready Status: True Type: Ready Last Transition Time: 2024-04-22T07:31:49Z Message: Deployment completed Reason: Ready Status: True Type: DeploymentReady Last Transition Time: 2024-04-22T07:31:11Z Message: Exposing service completed Reason: Ready Status: True Type: ExposeServiceReady Last Transition Time: 2024-04-22T07:31:11Z Message: Input data complete Reason: Ready Status: True Type: InputReady Last Transition Time: 2024-04-22T07:31:11Z Message: RoleBinding created Reason: Ready Status: True Type: RoleBindingReady Last Transition Time: 2024-04-22T07:31:11Z Message: Role created Reason: Ready Status: True Type: RoleReady Last Transition Time: 2024-04-22T07:31:11Z Message: ServiceAccount created Reason: Ready Status: True Type: ServiceAccountReady Last Transition Time: 2024-04-22T07:31:11Z Message: Service config create completed Reason: Ready Status: True Type: ServiceConfigReady Last Transition Time: 2024-04-22T07:31:11Z Message: Input data complete Reason: Ready Status: True Type: TLSInputReady
When the mariadb operator bootstraps a Galera cluster, it gathers information from every database replica, and then stores it in transient attributes. The transient attributes appear in the Galera CR’s status if the cluster is being inspected while the Galera cluster is stopped and being restarted:
Status: Attributes: openstack-galera-0: Seqno: 1232 openstack-galera-1: Container ID: cri-o://f56ec2389e878b462a54f5255dad83db29daf4d8e8cda338904bfd353b370165 Gcomm: gcomm:// Seqno: 1232 openstack-galera-2: Seqno: 1231 Bootstrapped: false
Before starting a Galera Cluster, the MariaDB operator starts all Galera pod replicas in a waiting
state. Even if you can see the pods using oc get pods
command, they have not started mysqld servers yet. The mariadb operator introspects the content of each pod’s database copy to extract the database sequence number (Seqno). Once the mariadb operator retrieves all of the pods’ Seqno information, it decides which pod holds the most recent version of the database and bootstraps a new Galera cluster from this pod. This pod starts a mysqld server and a transient attribute Gcomm://
appears in the Galera CR’s status. When the first mysqld server is ready to serve traffic, the attribute Bootstrapped
becomes true, and transient Attributes
for this pod are removed from the Galera CR’s status.
8.2. RHOSO RabbitMQ clusters
RHOSO deploys the two following RabbitMQ clusters:
-
rabbitmq
. This cluster is used for messaging between OpenStack services. -
rabbitmq-cell1
. This cluster is used by only Nova.
RabbitMQ Custom Resources configures both clusters.
- To retrieve more information about the RabbitMQ operator, use the following command:
$ oc get rabbitmq --show-labels NAME ALLREPLICASREADY RECONCILESUCCESS AGE LABELS rabbitmq True True 25h <none> rabbitmq-cell1 True True 25h <none>
The RabbitMQ-cluster operator completes the following tasks:
- Creates the pods that run the rabbitmq servers.
- Monitors the pods that run the rabbitmq servers.
- Restarts the pods that run the rabbitmq servers when healthchecks fail.
8.2.1. Monitoring the RabbitMQ operator’s startup
The state and the service availability of the rabbitmq-cluster operator and the rabbitmq clusters are exposed in the output of the Rabbitmq CR.
Procedure
- To retrieve information about the state and service availability of the rabbitmq-cluster operator and the rabbitmq clusters, use the following command:
Each RabbitMQ replica runs in a dedicated pod.
$ oc get pods -l app.kubernetes.io/name=rabbitmq NAME READY STATUS RESTARTS AGE rabbitmq-server-0 1/1 Running 0 46h rabbitmq-server-1 1/1 Running 0 46h rabbitmq-server-2 1/1 Running 0 46h
The rabbitmq-cluster operator creates two Openstack service objects for a rabbitmq cluster. One service provides a DNS name resolution to the rabbitmq servers for internal rabbitmq communication. The RabbitMQ messaging service is exposed using an Openshift service managed by MetalLB:
$ oc get service -l app.kubernetes.io/name=rabbitmq NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rabbitmq LoadBalancer 172.30.170.63 172.17.0.85 5671:31331/TCP,15671:31404/TCP,15691:31453/TCP 47h rabbitmq-nodes ClusterIP None <none> 4369/TCP,25672/TCP 47h
For example, this MetalLB-managed service is called rabbitmq
. This service acts as a load balancer across the RabbitMQ pods. It has an IP address that listens to the internal API network so it is accessible from the Openstack dataplane and controlplane. The MetalLB receives incoming traffic from the internal API on 172.17.0.85, and forwards it to the service’s IP 172.30.170.63 which balances traffic to rabbitmq pods:
$ oc describe service rabbitmq Name: rabbitmq Namespace: openstack Labels: app.kubernetes.io/component=rabbitmq app.kubernetes.io/name=rabbitmq app.kubernetes.io/part-of=rabbitmq Annotations: dnsmasq.network.openstack.org/hostname: rabbitmq.openstack.svc metallb.universe.tf/address-pool: internalapi metallb.universe.tf/ip-allocated-from-pool: internalapi metallb.universe.tf/loadBalancerIPs: 172.17.0.85 Selector: app.kubernetes.io/name=rabbitmq Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.170.63 IPs: 172.30.170.63 LoadBalancer Ingress: 172.17.0.85 Port: amqps 5671/TCP TargetPort: 5671/TCP NodePort: amqps 31331/TCP Endpoints: 192.168.16.69:5671,192.168.20.54:5671,192.168.24.45:5671
8.3. RHOSO memcached clusters
By default, all the OpenStack services in the control plane target a single memcached cluster that contains three memcached servers. This cluster is configured using a single memcached resource created by the openstack operator. The infra operator creates the pods that host the memcached servers and the OpenShift service objects that expose the memcached service.
8.3.1. Monitoring memached startup
Procedure
-
To monitor the memcached startup, use the
oc get memached
command. You can view the the startup state and service availability in theMessage
andReady
column:
$ oc get memcached NAME READY MESSAGE memcached True Setup complete
When a memached CR is marked as Ready
, its associated pods are started and ready to accept traffic. For example, here is a memcached cluster that is ready to accept traffic:
$ oc get pods -l memcached/name=memcached NAME READY STATUS RESTARTS AGE memcached-0 1/1 Running 0 2d4h memcached-1 1/1 Running 0 15m memcached-2 1/1 Running 0 15m $ oc get service memcached NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE memcached ClusterIP None <none> 11211/TCP,11212/TCP 2d4h
The memcached pods are accessed directly by name through the Openstack components. The memcached service is used only to maintain a list of DNS records for each memcached pod.
8.4. Listing RHOSO control plane services pods
You can list your control plane service pods to understand which pods are running on your control plane.
Procedure
Use the
oc get pods
command to list the pods:$ oc get pods |egrep -e "galera|rabbit|memcache" NAME READY STATUS RESTARTS AGE memcached-0 1/1 Running 0 28m memcached-1 1/1 Running 0 28m memcached-2 1/1 Running 0 28m openstack-cell1-galera-0 1/1 Running 0 28m openstack-cell1-galera-1 1/1 Running 0 28m openstack-cell1-galera-2 1/1 Running 0 28m openstack-galera-0 1/1 Running 0 28m openstack-galera-1 1/1 Running 0 28m openstack-galera-2 1/1 Running 0 28m rabbitmq-cell1-server-0 1/1 Running 0 28m rabbitmq-cell1-server-1 1/1 Running 0 28m rabbitmq-cell1-server-2 1/1 Running 0 28m rabbitmq-server-0 1/1 Running 0 28m rabbitmq-server-1 1/1 Running 0 28m rabbitmq-server-2 1/1 Running 0 28m
8.5. Listing the RHOSO High Availability operators
You can view the operators that your environment currently uses.
Procedure
- Use the following command to list these services:
$ oc get operators NAME AGE ... infra-operator.openstack-operators 9h ... mariadb-operator.openstack-operators 9h ... rabbitmq-cluster-operator.openstack-operators 9h
The infra-operator
is responsible for the Memcached
service.
8.6. Retrieving information about an operator’s Custom Resource
Procedure
Use the following command to view the custom resource definition that an operator implements:
$ oc describe operator/infra-operator.openstack-operators |less ... Status: Components: ... Kind: CustomResourceDefinition Name: memcacheds.memcached.openstack.org ...
- Use the following command to retrieve information about a custom resource’s definition:
$ oc describe crd/galeras.mariadb.openstack.org Name: galeras.mariadb.openstack.org Namespace: Labels: operators.coreos.com/mariadb-operator.openstack-operators= Annotations: controller-gen.kubebuilder.io/version: v0.11.1 operatorframework.io/installed-alongside-96a31840a95472ca: openstack-operators/mariadb-operator.v0.0.1 API Version: apiextensions.k8s.io/v1 Kind: CustomResourceDefinition Metadata: Creation Timestamp: 2024-03-21T22:08:06Z Generation: 1 Resource Version: 64637 UID: f68caee7-b4ec-4713-8095-c4ee9b1fd13e Spec: ....
For more information about operators, see What are Operators?
8.7. Retrieving information about an Operator’s statefulset
A statefulset manages the deployment and scaling of a set of pods. Each of the shared services Operators are responsible for creating and managing a statefulset
.
Procedure
-
Use the
oc get statefulset
command to retrieve information about the Operators’statefulset
:
$ oc get statefulset |egrep -e "galera|rabbit|memcache" NAME READY AGE memcached 1/1 174m openstack-cell1-galera 3/3 174m openstack-galera 3/3 174m rabbitmq-cell1-server 3/3 174m rabbitmq-server 3/3 174m
8.8. Retrieving more information about an Operator’s statefulset
You can retrieve the following information about the statefulset
of each service:
- Basic information about the service. For example, the number of the replicas
- Actual container details. For example, environment variables
- Volume details
- Event details
Procedure
-
To retrieve more information about the Opertor’s statefulset, use the
oc describe statefulset/<operator_name>
Replace <opeartore_name> with the name of the operator you want to retrieve more information about.
8.8.1. Basic information about a service’s statefulset
The following example shows the basic information that you can retrieve about an operator:
Name: openstack-galera Namespace: openstack CreationTimestamp: Thu, 21 Mar 2024 08:39:59 -0400 Selector: app=galera,cr=galera-openstack,galera/name=openstack,galera/namespace=openstack,galera/uid=1c93b3a3-1ac3-4f18-984d-34e9ce9dc12f,owner=mariadb-operator Labels: <none> Annotations: <none> Replicas: 3 desired | 3 total Update Strategy: RollingUpdate Partition: 0 Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=galera cr=galera-openstack galera/name=openstack galera/namespace=openstack galera/uid=1c93b3a3-1ac3-4f18-984d-34e9ce9dc12f owner=mariadb-operator Service Account: galera-openstack Init Containers: mysql-bootstrap: Image: quay.io/podified-antelope-centos9/openstack-mariadb@sha256:7fa37f7dcdd850fb6e401c4d5f76d16ad53ecdd14d6a130dbf61f02b819dcdf6 Port: <none> Host Port: <none> Command: bash /var/lib/operator-scripts/mysql_bootstrap.sh Environment: KOLLA_BOOTSTRAP: True KOLLA_CONFIG_STRATEGY: COPY_ALWAYS DB_ROOT_PASSWORD: <set to the key 'DbRootPassword' in secret 'osp-secret'> Optional: false Mounts: /var/lib/config-data/default from config-data-default (ro) /var/lib/config-data/generated from config-data-generated (rw) /var/lib/kolla/config_files from kolla-config (ro) /var/lib/mysql from mysql-db (rw) /var/lib/operator-scripts from operator-scripts (ro) /var/lib/secrets from secrets (ro) ... [cont]
8.8.2. Information about actual container of a service’s statefulset
The following example shows the information about the actual container that you can retrieve about an operator:
Containers: galera: Image: quay.io/podified-antelope-centos9/openstack-mariadb@sha256:7fa37f7dcdd850fb6e401c4d5f76d16ad53ecdd14d6a130dbf61f02b819dcdf6 Ports: 3306/TCP, 4567/TCP Host Ports: 0/TCP, 0/TCP Command: /usr/bin/dumb-init -- /usr/local/bin/kolla_start Liveness: exec [/bin/bash /var/lib/operator-scripts/mysql_probe.sh liveness] delay=0s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [/bin/bash /var/lib/operator-scripts/mysql_probe.sh readiness] delay=0s timeout=1s period=10s #success=1 #failure=3 Startup: exec [/bin/bash /var/lib/operator-scripts/mysql_probe.sh startup] delay=0s timeout=1s period=10s #success=1 #failure=30 Environment: CR_CONFIG_HASH: n558hf6h557hcfh589h688h684hb6h687h679h659h554h64fh77h76h568h695h5b6h8fh79h5c8h648h674hdch556h56bh655h64bh655h66ch5h5c4q KOLLA_CONFIG_STRATEGY: COPY_ALWAYS DB_ROOT_PASSWORD: <set to the key 'DbRootPassword' in secret 'osp-secret'> Optional: false Mounts: /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem from combined-ca-bundle (ro,path="tls-ca-bundle.pem") /var/lib/config-data/default from config-data-default (ro) /var/lib/config-data/generated from config-data-generated (rw) /var/lib/config-data/tls/certs/galera.crt from galera-tls-certs (ro,path="tls.crt") /var/lib/config-data/tls/private/galera.key from galera-tls-certs (ro,path="tls.key") /var/lib/kolla/config_files from kolla-config (ro) /var/lib/mysql from mysql-db (rw) /var/lib/operator-scripts from operator-scripts (ro) /var/lib/secrets from secrets (ro) ... [cont]
8.8.3. Information about the volumes of a service’s statefulset
The following example shows the information about the volumes of a service that you can retrieve about an operator:
Volumes: secrets: Type: Secret (a volume populated by a Secret) SecretName: osp-secret Optional: false kolla-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: openstack-config-data Optional: false config-data-generated: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> config-data-default: Type: ConfigMap (a volume populated by a ConfigMap) Name: openstack-config-data Optional: false operator-scripts: Type: ConfigMap (a volume populated by a ConfigMap) Name: openstack-scripts Optional: false galera-tls-certs: Type: Secret (a volume populated by a Secret) SecretName: cert-galera-openstack-svc Optional: false combined-ca-bundle: Type: Secret (a volume populated by a Secret) SecretName: combined-ca-bundle Optional: false Volume Claims: Name: mysql-db StorageClass: local-storage Labels: app=galera cr=galera-openstack galera/name=openstack galera/namespace=openstack galera/uid=1c93b3a3-1ac3-4f18-984d-34e9ce9dc12f owner=mariadb-operator Annotations: <none> Capacity: 5G Access Modes: [ReadWriteOnce] ... [cont]
8.8.4. Information about Event details of a service’s statefulset
The following example shows the Event details that you can retrieve about an operator:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 179m statefulset-controller create Claim mysql-db-openstack-galera-0 Pod openstack-galera-0 in statefulset openstack-galera success Normal SuccessfulCreate 179m statefulset-controller create Pod openstack-galera-0 in statefulset openstack-galera successful Normal SuccessfulCreate 179m statefulset-controller create Claim mysql-db-openstack-galera-1 Pod openstack-galera-1 in statefulset openstack-galera success Normal SuccessfulCreate 179m statefulset-controller create Claim mysql-db-openstack-galera-2 Pod openstack-galera-2 in statefulset openstack-galera success Normal SuccessfulCreate 179m statefulset-controller create Pod openstack-galera-1 in statefulset openstack-galera successful Normal SuccessfulCreate 179m statefulset-controller create Pod openstack-galera-2 in statefulset openstack-galera successful
8.9. Checking the status of the control plane
Each of the operators monitors the status of the pods that they manage. If necessary, they will take appropriate actions with the target of keeping one replica with a status of “ready” and “running”.
Procedure
Use the
oc get pods
command to check the status of your control plane shared services:oc get pods |egrep -e "galera|rabbit|memcache" NAME READY STATUS RESTARTS AGE memcached-0 1/1 Running 0 3h11m memcached-1 1/1 Running 0 3h11m memcached-2 1/1 Running 0 3h11m openstack-cell1-galera-0 1/1 Running 0 3h11m openstack-cell1-galera-1 1/1 Running 0 3h11m openstack-cell1-galera-2 1/1 Running 0 3h11m openstack-galera-0 1/1 Running 0 3h11m openstack-galera-1 1/1 Running 0 3h11m openstack-galera-2 1/1 Running 0 3h11m rabbitmq-cell1-server-0 1/1 Running 0 3h11m rabbitmq-cell1-server-1 1/1 Running 0 3h11m rabbitmq-cell1-server-2 1/1 Running 0 3h11m rabbitmq-server-0 1/1 Running 0 3h11m rabbitmq-server-1 1/1 Running 0 3h11m rabbitmq-server-2 1/1 Running 0 3h11m
8.9.1. Checking the status of a pod
Procedure
You can retrieve more information about a pod using the
oc describe pod/<pod-name>
command.NoteReplace <pod-name> with the name of the pod that you want to retrieve more information about.
$ oc describe pod/rabbitmq-server-0 Name: rabbitmq-server-0 Namespace: openstack Priority: 0 Service Account: rabbitmq-server Node: master-2/192.168.111.22 Start Time: Thu, 21 Mar 2024 08:39:57 -0400 Labels: app.kubernetes.io/component=rabbitmq app.kubernetes.io/name=rabbitmq app.kubernetes.io/part-of=rabbitmq controller-revision-hash=rabbitmq-server-5c886b79b4 statefulset.kubernetes.io/pod-name=rabbitmq-server-0 Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["192.168.16.35/22"],"mac_address":"0a:58:c0:a8:10:23","gateway_ips":["192.168.16.1"],"routes":[{"dest":"192.16... k8s.v1.cni.cncf.io/network-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "192.168.16.35" ], "mac": "0a:58:c0:a8:10:23", "default": true, "dns": {} }] openshift.io/scc: restricted-v2 seccomp.security.alpha.kubernetes.io/pod: runtime/default Status: Running ...
8.10. Exposure of each service through ClusterIP or LoadBalancer
The ClusterIP or the LoadBalancer exposes each service.
- To retrieve more information about the clustertips of the loadbalancers that expose a service, use the following command:
$ oc get svc |egrep -e "rabbit|galera|memcache" memcached ClusterIP None <none> 11211/TCP openstack-cell1-galera ClusterIP None <none> 3306/TCP openstack-galera ClusterIP None <none> 3306/TCP rabbitmq LoadBalancer 172.30.21.129 172.17.0.85 5672:31952/TCP,15672:30111/TCP,15692:30081/TCP rabbitmq-cell1 LoadBalancer 172.30.97.190 172.17.0.86 5672:30043/TCP,15672:30645/TCP,15692:32654/TCP rabbitmq-cell1-nodes ClusterIP None <none> 4369/TCP,25672/TCP rabbitmq-nodes ClusterIP None <none> 4369/TCP,25672/TCP
For more information about the OpenShift capabilities that you can use to expose the services, see About networking.
- Use the following command to retrieve more information about a service:
$ oc describe svc/rabbitmq Name: rabbitmq Namespace: openstack Labels: app.kubernetes.io/component=rabbitmq app.kubernetes.io/name=rabbitmq app.kubernetes.io/part-of=rabbitmq Annotations: dnsmasq.network.openstack.org/hostname: rabbitmq.openstack.svc metallb.universe.tf/address-pool: internalapi metallb.universe.tf/ip-allocated-from-pool: internalapi metallb.universe.tf/loadBalancerIPs: 172.17.0.85 Selector: app.kubernetes.io/name=rabbitmq Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.21.129 IPs: 172.30.21.129 LoadBalancer Ingress: 172.17.0.85 Port: amqp 5672/TCP TargetPort: 5672/TCP NodePort: amqp 31952/TCP Endpoints: 192.168.16.43:5672,192.168.20.69:5672,192.168.24.53:5672 Port: management 15672/TCP TargetPort: 15672/TCP NodePort: management 30111/TCP Endpoints: 192.168.16.43:15672,192.168.20.69:15672,192.168.24.53:15672 Port: prometheus 15692/TCP TargetPort: 15692/TCP NodePort: prometheus 30081/TCP Endpoints: 192.168.16.43:15692,192.168.20.69:15692,192.168.24.53:15692 Session Affinity: None External Traffic Policy: Cluster Events: <none>
8.11. Testing the resilience of the control plane
To test that the control plane shared services are resilient to container failures, you can simulate a failure.
Procedure
- To simulate a failure, you can use the following command to delete one of the pods:
$ oc delete pod/rabbitmq-server-1 pod "rabbitmq-server-1" deleted
After you delete the pod, the “rabbitmq-server-1” pod is immediately rescheduled:
$ oc get pods |grep -rabbit rabbitmq-cell1-server-0 1/1 Running 0 4h20m rabbitmq-cell1-server-1 1/1 Running 0 4h20m rabbitmq-cell1-server-2 1/1 Running 0 4h20m rabbitmq-server-0 1/1 Running 0 4h20m rabbitmq-server-1 0/1 Init:0/1 0 2s rabbitmq-server-2 1/1 Running 0 4h20m
After a few seconds, the pod should have the status of running
:
[zuul@controller-0 ~]$ oc get pods |grep rabbit rabbitmq-cell1-server-0 1/1 Running 0 4h23m rabbitmq-cell1-server-1 1/1 Running 0 4h23m rabbitmq-cell1-server-2 1/1 Running 0 4h23m rabbitmq-server-0 1/1 Running 0 4h23m rabbitmq-server-1 1/1 Running 0 3m8s rabbitmq-server-2 1/1 Running 0 4h23m
8.11.1. The Taint-Based Evictions feature
By default, The Taint-Based Evictions feature evicts pods from a node that experiences specific conditions like not-ready
and unreachable
. When a node experiences one of these conditions, OCP adds taints to the node, evicts the pods, and then reschedules the pods on different nodes.
Also, Taint-Based Evictions have a NoExecute effect
. Any pod that does not tolerate the taint is evicted immediately and any pod that does tolerate the taint will never be evicted, unless the pod uses the tolerationSeconds
parameter.
Use the tolerationSeconds
parameter to specify how long a pod stays bound to a node that has a node condition. If the condition still exists after the tolerationSeconds
period, the taint remains on the node and the pods with a matching toleration are evicted. If the condition clears before the tolerationSeconds period, pods with matching tolerations are not removed.
OpenShift Container Platform adds a toleration for node.kubernetes.io/not-ready and node.kubernetes.io/unreachable with tolerationSeconds=300, unless the Pod configuration specifies either toleration.
RHOSO 18.0 operators do not modify the default tolerationSeconds
values. Pods that run on a faulty worker node take more than five minutes to be rescheduled.
For more information about Remediation, fencing, and maintenance, see Remediation, fencing, and maintenance