Installing a Two Node OpenShift Cluster
Installing OpenShift Container Platform on a single node
Abstract
Chapter 1. Two-Node with Arbiter Copy linkLink copied to clipboard!
A Two-Node OpenShift with Arbiter (TNA) cluster is a compact, cost-effective OpenShift Container Platform topology. The topology consists of two control plane nodes and a lightweight arbiter node. The arbiter node stores the full etcd data, maintaining an etcd quorum and preventing split brain. The arbiter node does not run the additional control plane components kube-apiserver
and kube-controller-manager
, nor does it run workloads.
To install a Two-Node OpenShift with Arbiter cluster, assign an arbiter role to at least one of the nodes and set the control plane node count for the cluster to 2. Although OpenShift Container Platform does not currently impose a limit on the number of arbiter nodes, the typical deployment includes only one to minimize the use of hardware resources.
After installation, you can add additional arbiter nodes to a Two-Node OpenShift with Arbiter cluster but not to a standard multi-node cluster. It is also not possible to convert between a Two-Node OpenShift with Arbiter and standard topology.
You can install a Two-Node Arbiter cluster by using one of the following methods:
- Installing on bare metal: Configuring a local arbiter node
- Installing with the Agent-based Installer: Configuring a local arbiter node
Chapter 2. Two-node with Fencing Copy linkLink copied to clipboard!
2.1. Preparing to install a two-node OpenShift cluster with fencing Copy linkLink copied to clipboard!
Two-node OpenShift cluster with fencing is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
A two-node OpenShift cluster with fencing provides high availability (HA) with a reduced hardware footprint. This configuration is designed for distributed or edge environments where deploying a full three-node control plane cluster is not practical.
A two-node cluster does not include compute nodes. The two control plane machines run user workloads in addition to managing the cluster.
Fencing is managed by Pacemaker, which can isolate an unresponsive node by using the Baseboard Management Console (BMC) of the node. After the unresponsive node is fenced, the remaining node can safely continue operating the cluster without the risk of resource corruption.
You can deploy a two-node OpenShift cluster with fencing by using either the user-provisioned infrastructure method or the installer-provisioned infrastructure method.
The two-node OpenShift cluster with fencing requires the following hosts:
Hosts | Description |
---|---|
Two control plane machines | The control plane machines run the Kubernetes and OpenShift Container Platform services that form the control plane. |
One temporary bootstrap machine | You need a bootstrap machine to deploy the OpenShift Container Platform cluster on the control plane machines. You can remove the bootstrap machine after you install the cluster. |
The bootstrap and control plane machines must use Red Hat Enterprise Linux CoreOS (RHCOS) as the operating system. For instructions on installing RHCOS and starting the bootstrap process, see Installing RHCOS and starting the OpenShift Container Platform bootstrap process
The requirement to use RHCOS applies only to user-provisioned infrastructure deployments. For installer-provisioned infrastructure deployments, the bootstrap and control plane machines are provisioned automatically by the installation program, and you do not need to manually install RHCOS.
2.1.1. Minimum resource requirements for installing the two-node OpenShift cluster with fencing Copy linkLink copied to clipboard!
Each cluster machine must meet the following minimum requirements:
Machine | Operating System | CPU [1] | RAM | Storage | Input/Output Per Second (IOPS) [1] |
---|---|---|---|---|---|
Bootstrap | RHCOS | 4 | 16 GB | 120 GB | 300 |
Control plane | RHCOS | 4 | 16 GB | 120 GB | 300 |
- One CPU is equivalent to one physical core when simultaneous multithreading (SMT), or Hyper-Threading, is not enabled. When enabled, use the following formula to calculate the corresponding ratio: (threads per core × cores) × sockets = CPUs.
- OpenShift Container Platform and Kubernetes are sensitive to disk performance, and faster storage is recommended, particularly for etcd on the control plane nodes. Note that on many cloud platforms, storage size and IOPS scale together, so you might need to over-allocate storage volume to obtain sufficient performance.
2.1.2. User-provisioned DNS requirements Copy linkLink copied to clipboard!
In OpenShift Container Platform deployments, DNS name resolution is required for the following components:
- The Kubernetes API
- The OpenShift Container Platform application wildcard
- The bootstrap and control plane machines
Reverse DNS resolution is also required for the Kubernetes API, the bootstrap machine, and the control plane machines.
DNS A/AAAA or CNAME records are used for name resolution and PTR records are used for reverse name resolution. The reverse records are important because Red Hat Enterprise Linux CoreOS (RHCOS) uses the reverse records to set the hostnames for all the nodes, unless the hostnames are provided by DHCP. Additionally, the reverse records are used to generate the certificate signing requests (CSR) that OpenShift Container Platform needs to operate.
It is recommended to use a DHCP server to provide the hostnames to each cluster node. See the DHCP recommendations for user-provisioned infrastructure section for more information.
The following DNS records are required for a user-provisioned OpenShift Container Platform cluster and they must be in place before installation. In each record, <cluster_name>
is the cluster name and <base_domain>
is the base domain that you specify in the install-config.yaml
file. A complete DNS record takes the form: <component>.<cluster_name>.<base_domain>.
.
Component | Record | Description |
---|---|---|
Kubernetes API |
| A DNS A/AAAA or CNAME record, and a DNS PTR record, to identify the API load balancer. These records must be resolvable by both clients external to the cluster and from all the nodes within the cluster. |
| A DNS A/AAAA or CNAME record, and a DNS PTR record, to internally identify the API load balancer. These records must be resolvable from all the nodes within the cluster. Important The API server must be able to resolve the worker nodes by the hostnames that are recorded in Kubernetes. If the API server cannot resolve the node names, then proxied API calls can fail, and you cannot retrieve logs from pods. | |
Routes |
| A wildcard DNS A/AAAA or CNAME record that refers to the application ingress load balancer. The application ingress load balancer targets the machines that run the Ingress Controller pods. By default, the Ingress Controller pods run on compute nodes. In cluster topologies without dedicated compute nodes, such as two-node or three-node clusters, the control plane nodes also carry the worker label, so the Ingress pods are scheduled on the control plane nodes. These records must be resolvable by both clients external to the cluster and from all the nodes within the cluster.
For example, |
Bootstrap machine |
| A DNS A/AAAA or CNAME record, and a DNS PTR record, to identify the bootstrap machine. These records must be resolvable by the nodes within the cluster. |
Control plane machines |
| DNS A/AAAA or CNAME records and DNS PTR records to identify each machine for the control plane nodes. These records must be resolvable by the nodes within the cluster. |
In OpenShift Container Platform 4.4 and later, you do not need to specify etcd host and SRV records in your DNS configuration.
You can use the dig
command to verify name and reverse name resolution. See the section on Validating DNS resolution for user-provisioned infrastructure for detailed validation steps.
2.1.2.1. Example DNS configuration for user-provisioned clusters Copy linkLink copied to clipboard!
This section provides A and PTR record configuration samples that meet the DNS requirements for deploying OpenShift Container Platform on user-provisioned infrastructure. The samples are not meant to provide advice for choosing one DNS solution over another.
In the examples, the cluster name is ocp4
and the base domain is example.com
.
In a two-node cluster with fencing, the control plane machines are also schedulable worker nodes. The DNS configuration must therefore include only the two control plane nodes. If you later add compute machines, provide corresponding A and PTR records for them as in a standard user-provisioned installation.
Example DNS A record configuration for a user-provisioned cluster
The following example is a BIND zone file that shows sample A records for name resolution in a user-provisioned cluster.
Example 2.1. Sample DNS zone database
-
api.ocp4.example.com.
: Provides name resolution for the Kubernetes API. The record refers to the IP address of the API load balancer. -
api-int.ocp4.example.com.
: Provides name resolution for the Kubernetes API. The record refers to the IP address of the API load balancer and is used for internal cluster communications. *.apps.ocp4.example.com.
: Provides name resolution for the wildcard routes. The record refers to the IP address of the application ingress load balancer. The application ingress load balancer targets the machines that run the Ingress Controller pods.NoteIn the example, the same load balancer is used for the Kubernetes API and application ingress traffic. In production scenarios, you can deploy the API and application ingress load balancers separately so that you can scale the load balancer infrastructure for each in isolation.
-
bootstrap.ocp4.example.com.
: Provides name resolution for the bootstrap machine. -
control-plane0.ocp4.example.com.
: Provides name resolution for the control plane machines.
Example DNS PTR record configuration for a user-provisioned cluster
The following example BIND zone file shows sample PTR records for reverse name resolution in a user-provisioned cluster.
Example 2.2. Sample DNS zone database for reverse records
-
api.ocp4.example.com.
: Provides reverse DNS resolution for the Kubernetes API. The PTR record refers to the record name of the API load balancer. -
api-int.ocp4.example.com.
: Provides reverse DNS resolution for the Kubernetes API. The PTR record refers to the record name of the API load balancer and is used for internal cluster communications. -
bootstrap.ocp4.example.com.
: Provides reverse DNS resolution for the bootstrap machine. -
control-plane0.ocp4.example.com.
: Provides rebootstrap.ocp4.example.com.verse DNS resolution for the control plane machines.
A PTR record is not required for the OpenShift Container Platform application wildcard.
2.1.3. Installer-provisioned DNS requirements Copy linkLink copied to clipboard!
Clients access the OpenShift Container Platform cluster nodes over the baremetal
network. A network administrator must configure a subdomain or subzone where the canonical name extension is the cluster name.
<cluster_name>.<base_domain>
<cluster_name>.<base_domain>
For example:
test-cluster.example.com
test-cluster.example.com
OpenShift Container Platform includes functionality that uses cluster membership information to generate A/AAAA records. This resolves the node names to their IP addresses. After the nodes are registered with the API, the cluster can disperse node information without using CoreDNS-mDNS. This eliminates the network traffic associated with multicast DNS.
CoreDNS requires both TCP and UDP connections to the upstream DNS server to function correctly. Ensure the upstream DNS server can receive both TCP and UDP connections from OpenShift Container Platform cluster nodes.
In OpenShift Container Platform deployments, DNS name resolution is required for the following components:
- The Kubernetes API
- The OpenShift Container Platform application wildcard ingress API
A/AAAA records are used for name resolution and PTR records are used for reverse name resolution. Red Hat Enterprise Linux CoreOS (RHCOS) uses the reverse records or DHCP to set the hostnames for all the nodes.
Installer-provisioned installation includes functionality that uses cluster membership information to generate A/AAAA records. This resolves the node names to their IP addresses. In each record, <cluster_name>
is the cluster name and <base_domain>
is the base domain that you specify in the install-config.yaml
file. A complete DNS record takes the form: <component>.<cluster_name>.<base_domain>.
.
Component | Record | Description |
---|---|---|
Kubernetes API |
| An A/AAAA record and a PTR record identify the API load balancer. These records must be resolvable by both clients external to the cluster and from all the nodes within the cluster. |
Routes |
| The wildcard A/AAAA record refers to the application ingress load balancer. The application ingress load balancer targets the nodes that run the Ingress Controller pods. The Ingress Controller pods run on the worker nodes by default. These records must be resolvable by both clients external to the cluster and from all the nodes within the cluster.
For example, |
You can use the dig
command to verify DNS resolution.
2.1.4. Configuring an Ingress load balancer for a two-node cluster with fencing Copy linkLink copied to clipboard!
You must configure an external Ingress load balancer (LB) before you install a two-node OpenShift cluster with fencing. The Ingress LB forwards external application traffic to the Ingress Controller pods that run on the control plane nodes. Both nodes can actively receive traffic.
Prerequisites
- You have two control plane nodes with fencing enabled.
- You have network connectivity from the load balancer to both control plane nodes.
-
You created DNS records for
api.<cluster_name>.<base_domain>
and*.apps.<cluster_name>.<base_domain>
. - You have an external load balancer that supports health checks on endpoints.
Procedure
Configure the load balancer to forward traffic for the following ports:
-
6443
: Kubernetes API server 80
and443
: Application ingressYou must forward traffic to both control plane nodes.
-
- Configure health checks on the load balancer. You must monitor the backend endpoints so that the load balancer only sends traffic to nodes that respond.
Configure the load balancer to forward traffic to both control plane nodes. The following example shows how to configure two control plane nodes:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the load balancer configuration:
From an external client, run the following command:
curl -k https://api.<cluster_name>.<base_domain>:6443/version
$ curl -k https://api.<cluster_name>.<base_domain>:6443/version
Copy to Clipboard Copied! Toggle word wrap Toggle overflow From an external client, access an application route by running the following command:
curl https://<app>.<cluster_name>.<base_domain>
$ curl https://<app>.<cluster_name>.<base_domain>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
You can shut down a control plane node and verify that the load balancer stops sending traffic to that node while the other node continues to serve requests.
2.1.5. Creating a manifest object for a customized br-ex bridge Copy linkLink copied to clipboard!
You must create a manifest object to modify the cluster’s network configuration after installation. The manifest configures the br-ex bridge, which manages external network connectivity for the cluster.
For instructions on creating this manifest, "Creating a manifest file for a customized br-ex bridge".
2.2. Installing a two-node OpenShift cluster with fencing Copy linkLink copied to clipboard!
Two-node OpenShift cluster with fencing is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can deploy a two-node OpenShift cluster with fencing by using either the installer-provisioned infrastructure or the user-provisioned infrastructure installation method. The following examples provide sample install-config.yaml
configurations for both methods.
2.2.1. Sample install-config.yaml for a two-node installer-provisioned infrastructure cluster with fencing Copy linkLink copied to clipboard!
You can use the following install-config.yaml
configuration as a template for deploying a two-node OpenShift cluster with fencing by using the installer-provisioned infrastructure method:
Do an etcd backup before proceeding to ensure that you can restore the cluster if any issues occur.
Sample install-config.yaml
configuration
-
compute.replicas
: Set this field to0
because a two-node fencing cluster does not include worker nodes. -
controlPlane.replicas
: Set this field to2
for a two-node fencing deployment. -
fencing.credentials.hostname
: Provide the Baseboard Management Console (BMC) credentials for each control plane node. These credentials are required for node fencing and prevent split-brain scenarios. -
fencing.credentials.certificateVerification
: Set this field toDisabled
if your Redfish URL uses self-signed certificates, which is common for internally-hosted endpoints. Set this field toEnabled
for URLs with valid CA-signed certificates. -
metadata.name
: The cluster name is used as a prefix for hostnames and DNS records. -
featureSet
: Set this field toTechPreviewNoUpgrade
to enable two-node OpenShift cluster deployments. -
platform.baremetal.apiVIPs
andplatform.baremetal.ingressVIPs
: Virtual IPs for the API and Ingress endpoints. Ensure they are reachable by all nodes and external clients. -
pullSecret
: Contains credentials required to pull container images for the cluster components. -
sshKey
: The SSH public key for accessing cluster nodes after installation.
2.2.2. Sample install-config.yaml for a two-node user-provisioned infrastructure cluster with fencing Copy linkLink copied to clipboard!
You can use the following install-config.yaml
configuration as a template for deploying a two-node OpenShift cluster with fencing by using the user-provisioned infrastructure method:
Do an etcd backup before proceeding to ensure that you can restore the cluster if any issues occur.
Sample install-config.yaml
configuration
-
compute.replicas
: Set this field to0
because a two-node fencing cluster does not include worker nodes. -
controlPlane.replicas
: Set this field to2
for a two-node fencing deployment. -
fencing.credentials.hostname
: Provide BMC credentials for each control plane node. -
metadata.name
: Cluster name is used as a prefix for hostnames and DNS records. -
featureSet
: Enables two-node OpenShift cluster deployments. -
platform.none
Set the platform tonone
for user-provisioned infrastructure deployments. Bare-metal hosts are pre-provisioned outside of the installation program. -
pullSecret
: Contains credentials required to pull container images for the cluster components. -
sshKey
: The SSH public key for accessing cluster nodes after installation.
2.3. Post-installation troubleshooting and recovery Copy linkLink copied to clipboard!
Two-node OpenShift cluster with fencing is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Use the following sections help you with recovering from issues in a two-node OpenShift cluster with fencing.
2.3.3. Replacing control plane nodes in a two-node OpenShift cluster with fencing Copy linkLink copied to clipboard!
You can replace a failed control plane node in a two-node OpenShift cluster. The replacement node must use the same host name and IP address as the failed node.
Prerequisites
- You have a functioning survivor control plane node.
- You have verified that either the machine is not running or the node is not ready.
-
You have access to the cluster as a user with the
cluster-admin
role. - You know the host name and IP address of the failed node.
Do an etcd backup before proceeding to ensure that you can restore the cluster if any issues occur.
Procedure
Check the quorum state by running the following command:
sudo pcs quorum status
$ sudo pcs quorum status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If quorum is lost and one control plane node is still running, restore quorum manually on the survivor node by running the following command:
sudo pcs quorum unblock
$ sudo pcs quorum unblock
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If only one node failed, verify that etcd is running on the survivor node by running the following command:
sudo pcs resource status etcd
$ sudo pcs resource status etcd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If etcd is not running, restart etcd by running the following command:
sudo pcs resource cleanup etcd
$ sudo pcs resource cleanup etcd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If etcd still does not start, force it manually on the survivor node, skipping fencing:
ImportantBefore running this commands, ensure that the node being replaced is inaccessible. Otherwise, you risk etcd corruption.
sudo pcs resource debug-stop etcd
$ sudo pcs resource debug-stop etcd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo OCF_RESKEY_CRM_meta_notify_start_resource='etcd' pcs resource debug-start etcd
$ sudo OCF_RESKEY_CRM_meta_notify_start_resource='etcd' pcs resource debug-start etcd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After recovery, etcd must be running successfully on the survivor node.
Delete etcd secrets for the failed node by running the following commands:
oc project openshift-etcd
$ oc project openshift-etcd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete secret etcd-peer-<node_name>
$ oc delete secret etcd-peer-<node_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete secret etcd-serving-<node_name>
$ oc delete secret etcd-serving-<node_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete secret etcd-serving-metrics-<node_name>
$ oc delete secret etcd-serving-metrics-<node_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteTo replace the failed node, you must delete its etcd secrets first. When etcd is running, it might take some time for the API server to respond to these commands.
Delete resources for the failed node:
If you have the
BareMetalHost
(BMH) objects, list them to identify the host you are replacing by running the following command:oc get bmh -n openshift-machine-api
$ oc get bmh -n openshift-machine-api
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the BMH object for the failed node by running the following command:
oc delete bmh/<bmh_name> -n openshift-machine-api
$ oc delete bmh/<bmh_name> -n openshift-machine-api
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the
Machine
objects to identify the object that maps to the node that you are replacing by running the following command:oc get machines.machine.openshift.io -n openshift-machine-api
$ oc get machines.machine.openshift.io -n openshift-machine-api
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the label with the machine hash value from the
Machine
object by running the following command:oc get machines.machine.openshift.io/<machine_name> -n openshift-machine-api \ -o jsonpath='Machine hash label: {.metadata.labels.machine\.openshift\.io/cluster-api-cluster}{"\n"}'
$ oc get machines.machine.openshift.io/<machine_name> -n openshift-machine-api \ -o jsonpath='Machine hash label: {.metadata.labels.machine\.openshift\.io/cluster-api-cluster}{"\n"}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
<machine_name>
with the name of aMachine
object in your cluster. For example,ostest-bfs7w-ctrlplane-0
.You need this label to provision a new
Machine
object.Delete the
Machine
object for the failed node by running the following command:oc delete machines.machine.openshift.io/<machine_name>-<failed nodename> -n openshift-machine-api
$ oc delete machines.machine.openshift.io/<machine_name>-<failed nodename> -n openshift-machine-api
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe node object is deleted automatically after deleting the
Machine
object.
Recreate the failed host by using the same name and IP address:
ImportantYou must perform this step only if you are using installer-provisioned infrastructure or the Machine API to create the original node. For information about replacing a failed bare-metal control plane node, see "Replacing an unhealthy etcd member on bare metal".
-
Remove the BMH and
Machine
objects. The machine controller automatically deletes the node object. Provision a new machine by using the following sample configuration:
Example
Machine
object configurationCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
metadata.annotations.metal3.io/BareMetalHost
: Replace{bmh_name}
with the name of the BMH object that is associated with the host that you are replacing. -
labels.machine.openshift.io/cluster-api-cluster
: Replace{machine_hash_label}
with the label that you fetched from the machine you deleted. -
metadata.name
: Replace{machine_name}
with the name of the machine you deleted.
-
Create the new BMH object and the secret to store the BMC credentials by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
metadata.name
: Specify the name of the secret. -
metadata.name
: Replace{bmh_name}
with the name of the BMH object that you deleted. -
bmc.address
: Replace{uuid}
with the UUID of the node that you created. -
bmc.credentialsName
: Replacename
with the name of the secret that you created. -
bootMACAddress
: Specify the MAC address of the provisioning network interface. This is the MAC address the node uses to identify itself when communicating with Ironic during provisioning.
-
-
Remove the BMH and
Verify that the new node has reached the
Provisioned
state by running the following command:oc get bmh -o wide
$ oc get bmh -o wide
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The value of the
STATUS
column in the output of this command must beProvisioned
.NoteThe provisioning process can take 10 to 20 minutes to complete.
Verify that both control plane nodes are in the
Ready
state by running the following command:oc get nodes
$ oc get nodes
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The value of the
STATUS
column in the output of this command must beReady
for both nodes.Apply the
detached
annotation to the BMH object to prevent the Machine API from managing it by running the following command:oc annotate bmh <bmh_name> -n openshift-machine-api baremetalhost.metal3.io/detached='' --overwrite
$ oc annotate bmh <bmh_name> -n openshift-machine-api baremetalhost.metal3.io/detached='' --overwrite
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Rejoin the replacement node to the pacemaker cluster by running the following command:
NoteRun the following command on the survivor control plane node, not the node being replaced.
sudo pcs cluster node remove <node_name>
$ sudo pcs cluster node remove <node_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo pcs cluster node add <node_name> addr=<node_ip> --start --enable
$ sudo pcs cluster node add <node_name> addr=<node_ip> --start --enable
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete stale jobs for the failed node by running the following command:
oc project openshift-etcd
$ oc project openshift-etcd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete job tnf-auth-job-<node_name>
$ oc delete job tnf-auth-job-<node_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete job tnf-after-setup-job-<node_name>
$ oc delete job tnf-after-setup-job-<node_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
For information about verifying that both control plane nodes and etcd are operating correctly, see "Verifying etcd health in a two-node OpenShift cluster with fencing".
2.3.5. Verifying etcd health in a two-node OpenShift cluster with fencing Copy linkLink copied to clipboard!
After completing node recovery or maintenance procedures, verify that both control plane nodes and etcd are operating correctly.
Prerequisites
-
You have access to the cluster as a user with
cluster-admin
privileges. - You can access at least one control plane node through SSH.
Procedure
Check the overall node status by running the following command:
oc get nodes
$ oc get nodes
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command verifies that both control plane nodes are in the
Ready
state, indicating that they can receive workloads for scheduling.Verify the status of the
cluster-etcd-operator
by running the following command:oc describe co/etcd
$ oc describe co/etcd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
cluster-etcd-operator
manages and reports on the health of your etcd setup. Reviewing its status helps you identify any ongoing issues or degraded conditions.Review the etcd member list by running the following command:
oc rsh -n openshift-etcd <etcd_pod> etcdctl member list -w table
$ oc rsh -n openshift-etcd <etcd_pod> etcdctl member list -w table
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command shows the current etcd members and their roles. Look for any nodes marked as
learner
, which indicates that they are in the process of becoming voting members.Review the Pacemaker resource status by running the following command on either control plane node:
sudo pcs status --full
$ sudo pcs status --full
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command provides a detailed overview of all resources managed by Pacemaker. You must ensure that the following conditions are met:
- Both nodes are online.
-
The
kubelet
andetcd
resources are running. - Fencing is correctly configured for both nodes.
Legal Notice
Copy linkLink copied to clipboard!
Copyright © 2025 Red Hat
OpenShift documentation is licensed under the Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
Modified versions must remove all Red Hat trademarks.
Portions adapted from https://github.com/kubernetes-incubator/service-catalog/ with modifications by Red Hat.
Red Hat, Red Hat Enterprise Linux, the Red Hat logo, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat Software Collections is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.