搜索

此内容没有您所选择的语言版本。

Chapter 8. Monitoring high availbility services

download PDF

Red Hat OpenStack on OpenShift (RHOSO) high availability (HA) uses Red Hat OpenShift Container Platform (RHOCP) operations to orchestrate failover and recovery deployment. When you plan your deployment, ensure that you review the considerations for different aspects of the environment, such as hardware assignments and network configuration.

The following shared control plane services are required to implement HA:

  • Galera Cluster
  • RabbitMQ
  • memcached

These services run as pods, and they are managed and monitored by RHOCP.

You can use the OpenShift client command line interface (“oc”) to interact with the platform and retrieve information about the status of the OpenStack control plane services.

You can use the OpenShift Client (oc) to complete the following actions:

  • List the pods
  • Learn more about the pods' configuration
  • Retrieve information about the pods' runtime

8.1. RHOSO Galera clusters

RHOSO deploys the two following Galera clusters:

  • openstack. This cluster hosts the databases for all OpenStack services.
  • openstack-cell1. This cluster hosts the databases specific to Nova cell.

Galera Custom Resources configures both clusters.

To retrieve more information about the Galera’s Custom Resources, use the oc get galera command as shown in the following example:

$ oc get galera
NAME            READY   MESSAGE
openstack       True    Setup complete
openstack-cell1 True    Setup complete

The Message and Ready columns show the startup state and the service availability of the Galera CR. When the Ready condition is True, the pods are started and ready to accept traffic as shown in the following example:

$ oc get pod -l galera/name=openstack
NAME                 READY STATUS    RESTARTS    AGE
openstack-galera-0   1/1 	Running   0      	4h22m
openstack-galera-1   1/1 	Running   0      	4h22m
openstack-galera-2   1/1 	Running   0      	4h22m

The mariadb operator performs the following Galera cluster operations:

  • Creates the pods that host the mysqld servers.
  • Runs the logic for bootstrapping a Galera cluster. For example, the mariadb operator starts the cluster using the most recent copy of the Galera database.
  • Monitors the running Galera pods.
  • Restarts the pods when the pods fail the healthcheck.

To expose the database service, the mariadb operator creates an OpenShift service object called openstack. The OpenStack service object components access the database through the IP provided by the service:

$ oc get service -l mariadb/name=openstack
NAME        TYPE        CLUSTER-IP     EXTERNAL-IP    PORT(S)    AGE
openstack   ClusterIP   10.217.5.210   <none>         3306/TCP   7h

The incoming traffic is load-balanced to any available Galera pod. OpenShift marks a pod as available based on its Readiness healthcheck. If a pod misbehaves or if a pod begins to stop, it is removed from the service’s list of endpoints.

  • To view a list of available endpoints, use the following command:
Note

The mariadb operator creates a ‘headless’ service for the Galera pods. This service is a DNS service between Galera pods for internal Galera cluster communication.

8.1.1. Monitoring Galera startup

  • To monitor the startup of the Galera pods, use the oc describe galera command.

Galera CR’s status records the status of a Galera cluster’s startup. The CR’s conditions report the status of the prerequisites that Galera pods need to start as shown in the following example:

Note

The Ready condition is true only when all the other conditions are True.

Status:
  Conditions:
	Last Transition Time:  2024-04-22T07:32:06Z
	Message:           	Setup complete
	Reason:            	Ready
	Status:            	True
	Type:              	Ready
	Last Transition Time:  2024-04-22T07:31:49Z
	Message:           	Deployment completed
	Reason:            	Ready
	Status:            	True
	Type:              	DeploymentReady
	Last Transition Time:  2024-04-22T07:31:11Z
	Message:           	Exposing service completed
	Reason:            	Ready
	Status:            	True
	Type:              	ExposeServiceReady
	Last Transition Time:  2024-04-22T07:31:11Z
	Message:           	Input data complete
	Reason:            	Ready
	Status:            	True
	Type:              	InputReady
	Last Transition Time:  2024-04-22T07:31:11Z
	Message:           	RoleBinding created
	Reason:            	Ready
	Status:            	True
	Type:              	RoleBindingReady
	Last Transition Time:  2024-04-22T07:31:11Z
	Message:           	Role created
	Reason:            	Ready
	Status:            	True
	Type:              	RoleReady
	Last Transition Time:  2024-04-22T07:31:11Z
	Message:           	ServiceAccount created
	Reason:            	Ready
	Status:            	True
	Type:              	ServiceAccountReady
	Last Transition Time:  2024-04-22T07:31:11Z
	Message:           	Service config create completed
	Reason:            	Ready
	Status:            	True
	Type:              	ServiceConfigReady
	Last Transition Time:  2024-04-22T07:31:11Z
	Message:           	Input data complete
	Reason:            	Ready
	Status:            	True
	Type:              	TLSInputReady

When the mariadb operator bootstraps a Galera cluster, it gathers information from every database replica, and then stores it in transient attributes. The transient attributes appear in the Galera CR’s status if the cluster is being inspected while the Galera cluster is stopped and being restarted:

Status:
  Attributes:
	openstack-galera-0:
  	Seqno:  1232
	openstack-galera-1:
  	Container ID:  cri-o://f56ec2389e878b462a54f5255dad83db29daf4d8e8cda338904bfd353b370165
  	Gcomm:     	gcomm://
  	Seqno:     	1232
	openstack-galera-2:
  	Seqno: 	1231
  Bootstrapped:  false

Before starting a Galera Cluster, the MariaDB operator starts all Galera pod replicas in a waiting state. Even if you can see the pods using oc get pods command, they have not started mysqld servers yet. The mariadb operator introspects the content of each pod’s database copy to extract the database sequence number (Seqno). Once the mariadb operator retrieves all of the pods’ Seqno information, it decides which pod holds the most recent version of the database and bootstraps a new Galera cluster from this pod. This pod starts a mysqld server and a transient attribute Gcomm:// appears in the Galera CR’s status. When the first mysqld server is ready to serve traffic, the attribute Bootstrapped becomes true, and transient Attributes for this pod are removed from the Galera CR’s status.

8.2. RHOSO RabbitMQ clusters

RHOSO deploys the two following RabbitMQ clusters:

  • rabbitmq. This cluster is used for messaging between OpenStack services.
  • rabbitmq-cell1. This cluster is used by only Nova.

RabbitMQ Custom Resources configures both clusters.

  • To retrieve more information about the RabbitMQ operator, use the following command:
$ oc get rabbitmq --show-labels
NAME         	ALLREPLICASREADY   RECONCILESUCCESS   AGE   LABELS
rabbitmq     	True           	True           	25h   <none>
rabbitmq-cell1   True           	True           	25h   <none>

The RabbitMQ-cluster operator completes the following tasks:

  • Creates the pods that run the rabbitmq servers.
  • Monitors the pods that run the rabbitmq servers.
  • Restarts the pods that run the rabbitmq servers when healthchecks fail.

8.2.1. Monitoring the RabbitMQ operator’s startup

The state and the service availability of the rabbitmq-cluster operator and the rabbitmq clusters are exposed in the output of the Rabbitmq CR.

Procedure

  • To retrieve information about the state and service availability of the rabbitmq-cluster operator and the rabbitmq clusters, use the following command:
Note

Each RabbitMQ replica runs in a dedicated pod.

$ oc get pods -l app.kubernetes.io/name=rabbitmq
NAME            	READY   STATUS	RESTARTS  	AGE
rabbitmq-server-0   1/1 	Running   0                46h
rabbitmq-server-1   1/1 	Running   0                46h
rabbitmq-server-2   1/1 	Running   0                46h

The rabbitmq-cluster operator creates two Openstack service objects for a rabbitmq cluster. One service provides a DNS name resolution to the rabbitmq servers for internal rabbitmq communication. The RabbitMQ messaging service is exposed using an Openshift service managed by MetalLB:

$ oc get service -l app.kubernetes.io/name=rabbitmq
NAME         	TYPE       	CLUSTER-IP  	EXTERNAL-IP   PORT(S)                   	AGE
rabbitmq     	LoadBalancer   172.30.170.63   172.17.0.85   5671:31331/TCP,15671:31404/TCP,15691:31453/TCP   47h
rabbitmq-nodes   ClusterIP  	None        	<none>    	4369/TCP,25672/TCP                           	47h

For example, this MetalLB-managed service is called rabbitmq. This service acts as a load balancer across the RabbitMQ pods. It has an IP address that listens to the internal API network so it is accessible from the Openstack dataplane and controlplane. The MetalLB receives incoming traffic from the internal API on 172.17.0.85, and forwards it to the service’s IP 172.30.170.63 which balances traffic to rabbitmq pods:

$ oc describe service rabbitmq
Name:                 	rabbitmq
Namespace:            	openstack
Labels:               	app.kubernetes.io/component=rabbitmq
                      	app.kubernetes.io/name=rabbitmq
                      	app.kubernetes.io/part-of=rabbitmq
Annotations:          	dnsmasq.network.openstack.org/hostname: rabbitmq.openstack.svc
                      	metallb.universe.tf/address-pool: internalapi
                      	metallb.universe.tf/ip-allocated-from-pool: internalapi
                      	metallb.universe.tf/loadBalancerIPs: 172.17.0.85
Selector:             	app.kubernetes.io/name=rabbitmq
Type:                 	LoadBalancer
IP Family Policy:     	SingleStack
IP Families:          	IPv4
IP:                   	172.30.170.63
IPs:                  	172.30.170.63
LoadBalancer Ingress: 	172.17.0.85
Port:                 	amqps  5671/TCP
TargetPort:           	5671/TCP
NodePort:             	amqps  31331/TCP
Endpoints:            	192.168.16.69:5671,192.168.20.54:5671,192.168.24.45:5671

8.3. RHOSO memcached clusters

By default, all the OpenStack services in the control plane target a single memcached cluster that contains three memcached servers. This cluster is configured using a single memcached resource created by the openstack operator. The infra operator creates the pods that host the memcached servers and the OpenShift service objects that expose the memcached service.

8.3.1. Monitoring memached startup

Procedure

  • To monitor the memcached startup, use the oc get memached command. You can view the the startup state and service availability in the Message and Ready column:
$ oc get memcached
NAME        READY   MESSAGE
memcached   True	Setup complete

When a memached CR is marked as Ready, its associated pods are started and ready to accept traffic. For example, here is a memcached cluster that is ready to accept traffic:

$ oc get pods -l memcached/name=memcached
NAME      	READY   STATUS	RESTARTS   AGE
memcached-0   1/1 	Running   0      	2d4h
memcached-1   1/1 	Running   0      	15m
memcached-2   1/1 	Running   0      	15m

$ oc get service memcached
NAME        TYPE        CLUSTER-IP   EXTERNAL-IP     PORT(S)           	  AGE
memcached   ClusterIP   None     	    <none>          11211/TCP,11212/TCP   2d4h

The memcached pods are accessed directly by name through the Openstack components. The memcached service is used only to maintain a list of DNS records for each memcached pod.

8.4. Listing RHOSO control plane services pods

You can list your control plane service pods to understand which pods are running on your control plane.

Procedure

  • Use the oc get pods command to list the pods:

    $ oc get pods |egrep -e "galera|rabbit|memcache"
    NAME                            		READY   STATUS  RESTARTS   	AGE
    memcached-0                            	1/1 	Running 	0      	28m
    memcached-1                            	1/1 	Running 	0      	28m
    memcached-2                            	1/1 	Running 	0      	28m
    openstack-cell1-galera-0                1/1 	Running 	0      	28m
    openstack-cell1-galera-1                1/1 	Running 	0      	28m
    openstack-cell1-galera-2                1/1 	Running 	0      	28m
    openstack-galera-0                    	1/1 	Running 	0      	28m
    openstack-galera-1                    	1/1 	Running 	0      	28m
    openstack-galera-2                     	1/1 	Running 	0      	28m
    rabbitmq-cell1-server-0                 1/1 	Running 	0      	28m
    rabbitmq-cell1-server-1                 1/1 	Running 	0      	28m
    rabbitmq-cell1-server-2                 1/1 	Running 	0      	28m
    rabbitmq-server-0                     	1/1 	Running 	0      	28m
    rabbitmq-server-1                     	1/1 	Running 	0      	28m
    rabbitmq-server-2                      	1/1 	Running 	0      	28m

8.5. Listing the RHOSO High Availability operators

You can view the operators that your environment currently uses.

Procedure

  • Use the following command to list these services:
$ oc get operators
NAME                                                	AGE
...
infra-operator.openstack-operators                  	9h
...
mariadb-operator.openstack-operators                	9h
...
rabbitmq-cluster-operator.openstack-operators       	9h
Note

The infra-operator is responsible for the Memcached service.

8.6. Retrieving information about an operator’s Custom Resource

Procedure

  1. Use the following command to view the custom resource definition that an operator implements:

    $ oc describe operator/infra-operator.openstack-operators |less
    ...
    Status:
      Components:
    ...
      	Kind:                	CustomResourceDefinition
      	Name:                	memcacheds.memcached.openstack.org
    ...
  2. Use the following command to retrieve information about a custom resource’s definition:
$ oc describe crd/galeras.mariadb.openstack.org
Name:     	galeras.mariadb.openstack.org
Namespace:
Labels:   	operators.coreos.com/mariadb-operator.openstack-operators=
Annotations:  controller-gen.kubebuilder.io/version: v0.11.1
          	operatorframework.io/installed-alongside-96a31840a95472ca: openstack-operators/mariadb-operator.v0.0.1
API Version:  apiextensions.k8s.io/v1
Kind:     	CustomResourceDefinition
Metadata:
  Creation Timestamp:  2024-03-21T22:08:06Z
  Generation:      	1
  Resource Version:	64637
  UID:             	f68caee7-b4ec-4713-8095-c4ee9b1fd13e
Spec:
....

For more information about operators, see What are Operators?

8.7. Retrieving information about an Operator’s statefulset

A statefulset manages the deployment and scaling of a set of pods. Each of the shared services Operators are responsible for creating and managing a statefulset.

Procedure

  • Use the oc get statefulset command to retrieve information about the Operators’ statefulset:
$ oc get statefulset |egrep -e "galera|rabbit|memcache"
NAME                      			READY   AGE
memcached                 			1/1 	174m
openstack-cell1-galera    		3/3 	174m
openstack-galera          		3/3 	174m
rabbitmq-cell1-server     		3/3 	174m
rabbitmq-server           		3/3 	174m

8.8. Retrieving more information about an Operator’s statefulset

You can retrieve the following information about the statefulset of each service:

  • Basic information about the service. For example, the number of the replicas
  • Actual container details. For example, environment variables
  • Volume details
  • Event details

Procedure

  • To retrieve more information about the Opertor’s statefulset, use the oc describe statefulset/<operator_name>

Replace <opeartore_name> with the name of the operator you want to retrieve more information about.

8.8.1. Basic information about a service’s statefulset

The following example shows the basic information that you can retrieve about an operator:

Name:           	openstack-galera
Namespace:      	openstack
CreationTimestamp:  Thu, 21 Mar 2024 08:39:59 -0400
Selector:       	app=galera,cr=galera-openstack,galera/name=openstack,galera/namespace=openstack,galera/uid=1c93b3a3-1ac3-4f18-984d-34e9ce9dc12f,owner=mariadb-operator
Labels:         	<none>
Annotations:    	<none>
Replicas:       	3 desired | 3 total
Update Strategy:	RollingUpdate
  Partition:    	0
Pods Status:    	3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       	app=galera
                	cr=galera-openstack
                	galera/name=openstack
                	galera/namespace=openstack
                	galera/uid=1c93b3a3-1ac3-4f18-984d-34e9ce9dc12f
                	owner=mariadb-operator
  Service Account:  galera-openstack
  Init Containers:
   mysql-bootstrap:
	Image:  	quay.io/podified-antelope-centos9/openstack-mariadb@sha256:7fa37f7dcdd850fb6e401c4d5f76d16ad53ecdd14d6a130dbf61f02b819dcdf6
	Port:   	<none>
	Host Port:  <none>
	Command:
  	bash
  	/var/lib/operator-scripts/mysql_bootstrap.sh
	Environment:
  	KOLLA_BOOTSTRAP:    	True
  	KOLLA_CONFIG_STRATEGY:  COPY_ALWAYS
  	DB_ROOT_PASSWORD:   	<set to the key 'DbRootPassword' in secret 'osp-secret'>  Optional: false
	Mounts:
  	/var/lib/config-data/default from config-data-default (ro)
  	/var/lib/config-data/generated from config-data-generated (rw)
  	/var/lib/kolla/config_files from kolla-config (ro)
  	/var/lib/mysql from mysql-db (rw)
  	/var/lib/operator-scripts from operator-scripts (ro)
  	/var/lib/secrets from secrets (ro)
... [cont]

8.8.2. Information about actual container of a service’s statefulset

The following example shows the information about the actual container that you can retrieve about an operator:

 Containers:
   galera:
	Image:   	quay.io/podified-antelope-centos9/openstack-mariadb@sha256:7fa37f7dcdd850fb6e401c4d5f76d16ad53ecdd14d6a130dbf61f02b819dcdf6
	Ports:   	3306/TCP, 4567/TCP
	Host Ports:  0/TCP, 0/TCP
	Command:
  	/usr/bin/dumb-init
  	--
  	/usr/local/bin/kolla_start
	Liveness:   exec [/bin/bash /var/lib/operator-scripts/mysql_probe.sh liveness] delay=0s timeout=1s period=10s #success=1 #failure=3
	Readiness:  exec [/bin/bash /var/lib/operator-scripts/mysql_probe.sh readiness] delay=0s timeout=1s period=10s #success=1 #failure=3
	Startup:	exec [/bin/bash /var/lib/operator-scripts/mysql_probe.sh startup] delay=0s timeout=1s period=10s #success=1 #failure=30
	Environment:
  	CR_CONFIG_HASH:     	n558hf6h557hcfh589h688h684hb6h687h679h659h554h64fh77h76h568h695h5b6h8fh79h5c8h648h674hdch556h56bh655h64bh655h66ch5h5c4q
  	KOLLA_CONFIG_STRATEGY:  COPY_ALWAYS
  	DB_ROOT_PASSWORD:   	<set to the key 'DbRootPassword' in secret 'osp-secret'>  Optional: false
	Mounts:
  	/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem from combined-ca-bundle (ro,path="tls-ca-bundle.pem")
  	/var/lib/config-data/default from config-data-default (ro)
  	/var/lib/config-data/generated from config-data-generated (rw)
  	/var/lib/config-data/tls/certs/galera.crt from galera-tls-certs (ro,path="tls.crt")
  	/var/lib/config-data/tls/private/galera.key from galera-tls-certs (ro,path="tls.key")
  	/var/lib/kolla/config_files from kolla-config (ro)
  	/var/lib/mysql from mysql-db (rw)
  	/var/lib/operator-scripts from operator-scripts (ro)
  	/var/lib/secrets from secrets (ro)
... [cont]

8.8.3. Information about the volumes of a service’s statefulset

The following example shows the information about the volumes of a service that you can retrieve about an operator:

 Volumes:
   secrets:
	Type:    	Secret (a volume populated by a Secret)
	SecretName:  osp-secret
	Optional:	false
   kolla-config:
	Type:  	ConfigMap (a volume populated by a ConfigMap)
	Name:  	openstack-config-data
	Optional:  false
   config-data-generated:
	Type:   	EmptyDir (a temporary directory that shares a pod's lifetime)
	Medium:
	SizeLimit:  <unset>
   config-data-default:
	Type:  	ConfigMap (a volume populated by a ConfigMap)
	Name:  	openstack-config-data
	Optional:  false
   operator-scripts:
	Type:  	ConfigMap (a volume populated by a ConfigMap)
	Name:  	openstack-scripts
	Optional:  false
   galera-tls-certs:
	Type:    	Secret (a volume populated by a Secret)
	SecretName:  cert-galera-openstack-svc
	Optional:	false
   combined-ca-bundle:
	Type:    	Secret (a volume populated by a Secret)
	SecretName:  combined-ca-bundle
	Optional:	false
Volume Claims:
  Name:      	mysql-db
  StorageClass:  local-storage
  Labels:    	app=galera
             	cr=galera-openstack
             	galera/name=openstack
             	galera/namespace=openstack
             	galera/uid=1c93b3a3-1ac3-4f18-984d-34e9ce9dc12f
             	owner=mariadb-operator
  Annotations:   <none>
  Capacity:  	5G
  Access Modes:  [ReadWriteOnce]
... [cont]

8.8.4. Information about Event details of a service’s statefulset

The following example shows the Event details that you can retrieve about an operator:

Events:
  Type	Reason        	Age   From                	Message
  ----	------        	----  ----                	-------
  Normal  SuccessfulCreate  179m  statefulset-controller  create Claim mysql-db-openstack-galera-0 Pod openstack-galera-0 in statefulset openstack-galera success
  Normal  SuccessfulCreate  179m  statefulset-controller  create Pod openstack-galera-0 in statefulset openstack-galera successful
  Normal  SuccessfulCreate  179m  statefulset-controller  create Claim mysql-db-openstack-galera-1 Pod openstack-galera-1 in statefulset openstack-galera success
  Normal  SuccessfulCreate  179m  statefulset-controller  create Claim mysql-db-openstack-galera-2 Pod openstack-galera-2 in statefulset openstack-galera success
  Normal  SuccessfulCreate  179m  statefulset-controller  create Pod openstack-galera-1 in statefulset openstack-galera successful
  Normal  SuccessfulCreate  179m  statefulset-controller  create Pod openstack-galera-2 in statefulset openstack-galera successful

8.9. Checking the status of the control plane

Each of the operators monitors the status of the pods that they manage. If necessary, they will take appropriate actions with the target of keeping one replica with a status of “ready” and “running”.

Procedure

  • Use the oc get pods command to check the status of your control plane shared services:

    oc get pods |egrep -e "galera|rabbit|memcache"
    NAME                            	READY   STATUS  	RESTARTS   AGE
    memcached-0                        1/1  Running 	0      	3h11m
    memcached-1                        1/1  Running 	0      	3h11m
    memcached-2                        1/1  Running 	0      	3h11m
    openstack-cell1-galera-0           1/1 	Running 	0      	3h11m
    openstack-cell1-galera-1           1/1 	Running 	0      	3h11m
    openstack-cell1-galera-2           1/1 	Running 	0      	3h11m
    openstack-galera-0                 1/1 	Running 	0      	3h11m
    openstack-galera-1                 1/1 	Running 	0      	3h11m
    openstack-galera-2                 1/1 	Running 	0      	3h11m
    rabbitmq-cell1-server-0            1/1 	Running 	0      	3h11m
    rabbitmq-cell1-server-1            1/1 	Running 	0      	3h11m
    rabbitmq-cell1-server-2            1/1 	Running 	0      	3h11m
    rabbitmq-server-0                  1/1 	Running 	0      	3h11m
    rabbitmq-server-1                  1/1 	Running 	0      	3h11m
    rabbitmq-server-2                  1/1 	Running 	0      	3h11m

8.9.1. Checking the status of a pod

Procedure

  • You can retrieve more information about a pod using the oc describe pod/<pod-name> command.

    Note

    Replace <pod-name> with the name of the pod that you want to retrieve more information about.

$ oc describe pod/rabbitmq-server-0
Name:         	rabbitmq-server-0
Namespace:    	openstack
Priority:     	0
Service Account:  rabbitmq-server
Node:         	master-2/192.168.111.22
Start Time:   	Thu, 21 Mar 2024 08:39:57 -0400
Labels:       	app.kubernetes.io/component=rabbitmq
              	app.kubernetes.io/name=rabbitmq
              	app.kubernetes.io/part-of=rabbitmq
              	controller-revision-hash=rabbitmq-server-5c886b79b4
              	statefulset.kubernetes.io/pod-name=rabbitmq-server-0
Annotations:  	k8s.ovn.org/pod-networks:
                	{"default":{"ip_addresses":["192.168.16.35/22"],"mac_address":"0a:58:c0:a8:10:23","gateway_ips":["192.168.16.1"],"routes":[{"dest":"192.16...
              	k8s.v1.cni.cncf.io/network-status:
                	[{
                    	"name": "ovn-kubernetes",
                    	"interface": "eth0",
                    	"ips": [
                        	"192.168.16.35"
                    	],
                    	"mac": "0a:58:c0:a8:10:23",
                    	"default": true,
                    	"dns": {}
                	}]
              	openshift.io/scc: restricted-v2
              	seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:       	Running
...

8.10. Exposure of each service through ClusterIP or LoadBalancer

The ClusterIP or the LoadBalancer exposes each service.

  • To retrieve more information about the clustertips of the loadbalancers that expose a service, use the following command:
$ oc get svc |egrep -e "rabbit|galera|memcache"
memcached                  		ClusterIP  	None         	<none>       	11211/TCP    openstack-cell1-galera     	ClusterIP  	None         	<none>       	3306/TCP
openstack-galera           	ClusterIP  	None         	<none>       	3306/TCP
rabbitmq                   		LoadBalancer   172.30.21.129	172.17.0.85  	5672:31952/TCP,15672:30111/TCP,15692:30081/TCP
rabbitmq-cell1             	LoadBalancer   172.30.97.190	172.17.0.86  	5672:30043/TCP,15672:30645/TCP,15692:32654/TCP
rabbitmq-cell1-nodes       	ClusterIP  	None         	<none>       	4369/TCP,25672/TCP
rabbitmq-nodes             	ClusterIP  	None         	<none>       	4369/TCP,25672/TCP

For more information about the OpenShift capabilities that you can use to expose the services, see About networking.

  • Use the following command to retrieve more information about a service:
$ oc describe svc/rabbitmq
Name:                 	rabbitmq
Namespace:            	openstack
Labels:               	app.kubernetes.io/component=rabbitmq
                      	app.kubernetes.io/name=rabbitmq
                      	app.kubernetes.io/part-of=rabbitmq
Annotations:          	dnsmasq.network.openstack.org/hostname: rabbitmq.openstack.svc
                      	metallb.universe.tf/address-pool: internalapi
                      	metallb.universe.tf/ip-allocated-from-pool: internalapi
                      	metallb.universe.tf/loadBalancerIPs: 172.17.0.85
Selector:             	app.kubernetes.io/name=rabbitmq
Type:                 	LoadBalancer
IP Family Policy:     	SingleStack
IP Families:          	IPv4
IP:                   	172.30.21.129
IPs:                  	172.30.21.129
LoadBalancer Ingress: 	172.17.0.85
Port:                 	amqp  5672/TCP
TargetPort:           	5672/TCP
NodePort:             	amqp  31952/TCP
Endpoints:            	192.168.16.43:5672,192.168.20.69:5672,192.168.24.53:5672
Port:                 	management  15672/TCP
TargetPort:           	15672/TCP
NodePort:             	management  30111/TCP
Endpoints:            	192.168.16.43:15672,192.168.20.69:15672,192.168.24.53:15672
Port:                 	prometheus  15692/TCP
TargetPort:           	15692/TCP
NodePort:             	prometheus  30081/TCP
Endpoints:            	192.168.16.43:15692,192.168.20.69:15692,192.168.24.53:15692
Session Affinity:     	None
External Traffic Policy:  Cluster
Events:               	<none>

8.11. Testing the resilience of the control plane

To test that the control plane shared services are resilient to container failures, you can simulate a failure.

Procedure

  • To simulate a failure, you can use the following command to delete one of the pods:
$ oc delete pod/rabbitmq-server-1
pod "rabbitmq-server-1" deleted

After you delete the pod, the “rabbitmq-server-1” pod is immediately rescheduled:

$ oc get pods |grep -rabbit
rabbitmq-cell1-server-0                     	1/1 	Running 	0      	4h20m
rabbitmq-cell1-server-1                     	1/1 	Running 	0      	4h20m
rabbitmq-cell1-server-2                     	1/1 	Running 	0      	4h20m
rabbitmq-server-0                              	1/1 	Running 	0      	4h20m
rabbitmq-server-1                              	0/1 	Init:0/1	0      	2s
rabbitmq-server-2                              	1/1 	Running 	0      	4h20m

After a few seconds, the pod should have the status of running:

[zuul@controller-0 ~]$ oc get pods |grep rabbit
rabbitmq-cell1-server-0                      	1/1 	Running 	0      	4h23m
rabbitmq-cell1-server-1                      	1/1 	Running 	0      	4h23m
rabbitmq-cell1-server-2                      	1/1 	Running 	0      	4h23m
rabbitmq-server-0                              	1/1 	Running 	0      	4h23m
rabbitmq-server-1                              	1/1 	Running 	0      	3m8s
rabbitmq-server-2                              	1/1 	Running 	0      	4h23m

8.11.1. The Taint-Based Evictions feature

By default, The Taint-Based Evictions feature evicts pods from a node that experiences specific conditions like not-ready and unreachable. When a node experiences one of these conditions, OCP adds taints to the node, evicts the pods, and then reschedules the pods on different nodes.

Also, Taint-Based Evictions have a NoExecute effect. Any pod that does not tolerate the taint is evicted immediately and any pod that does tolerate the taint will never be evicted, unless the pod uses the tolerationSeconds parameter.

Use the tolerationSeconds parameter to specify how long a pod stays bound to a node that has a node condition. If the condition still exists after the tolerationSeconds period, the taint remains on the node and the pods with a matching toleration are evicted. If the condition clears before the tolerationSeconds period, pods with matching tolerations are not removed.

OpenShift Container Platform adds a toleration for node.kubernetes.io/not-ready and node.kubernetes.io/unreachable with tolerationSeconds=300, unless the Pod configuration specifies either toleration.

Important

RHOSO 18.0 operators do not modify the default tolerationSeconds values. Pods that run on a faulty worker node take more than five minutes to be rescheduled.

For more information about Remediation, fencing, and maintenance, see Remediation, fencing, and maintenance

Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.