Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 12. Expanding the cluster
You can expand a cluster installed with the Assisted Installer by adding hosts using the user interface or the API.
12.1. Prerequisites Link kopierenLink in die Zwischenablage kopiert!
- You must have access to an Assisted Installer cluster.
-
You must install the OpenShift CLI (
oc). - Ensure that all the required DNS records exist for the cluster that you are adding the worker node to.
-
If you are adding a worker node to a cluster with multiple CPU architectures, you must ensure that the architecture is set to
multi. -
If you are adding
arm64,IBM Power, orIBM zSystemscompute nodes to an existingx86_64cluster, use a platform that supports a mixed architecture. For details, see Installing a mixed architecture cluster
12.2. Checking for multiple architectures Link kopierenLink in die Zwischenablage kopiert!
When adding a node to a cluster with multiple architectures, ensure that the architecture setting is set to multi.
Procedure
- Log in to the cluster using the CLI.
Check the
architecturesetting:oc adm release info -o json | jq .metadata.metadata
$ oc adm release info -o json | jq .metadata.metadataCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure that the
architecturesetting is set to 'multi'.{ "release.openshift.io/architecture": "multi" }{ "release.openshift.io/architecture": "multi" }Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.3. Adding hosts with the UI Link kopierenLink in die Zwischenablage kopiert!
You can add hosts to clusters that were created using the Assisted Installer.
Adding hosts to Assisted Installer clusters is only supported for clusters running OpenShift Container Platform version 4.11 and up.
Procedure
- Log in to OpenShift Cluster Manager and click the cluster that you want to expand.
- Click Add hosts and download the discovery ISO for the new host, adding an SSH public key and configuring cluster-wide proxy settings as needed.
- Optional: Modify ignition files as needed.
- Boot the target host using the discovery ISO, and wait for the host to be discovered in the console.
-
Select the host role. It can be either a
workeror acontrol planehost. - Start the installation.
As the installation proceeds, the installation generates pending certificate signing requests (CSRs) for the host. When prompted, approve the pending CSRs to complete the installation.
When the host is successfully installed, it is listed as a host in the cluster web console.
New hosts will be encrypted using the same method as the original cluster.
12.4. Adding hosts with the API Link kopierenLink in die Zwischenablage kopiert!
You can add hosts to clusters using the Assisted Installer REST API.
Prerequisites
-
Install the OpenShift Cluster Manager CLI (
ocm). - Log in to OpenShift Cluster Manager as a user with cluster creation privileges.
-
Install
jq. - Ensure that all the required DNS records exist for the cluster that you want to expand.
Procedure
- Authenticate against the Assisted Installer REST API and generate an API token for your session. The generated token is valid for 15 minutes only.
Set the
$API_URLvariable by running the following command:export API_URL=<api_url>
$ export API_URL=<api_url>1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Replace
<api_url>with the Assisted Installer API URL, for example,https://api.openshift.com
Import the cluster by running the following commands:
Set the
$CLUSTER_IDvariable. Log in to the cluster and run the following command:export CLUSTER_ID=$(oc get clusterversion -o jsonpath='{.items[].spec.clusterID}')$ export CLUSTER_ID=$(oc get clusterversion -o jsonpath='{.items[].spec.clusterID}')Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the
$CLUSTER_REQUESTvariable that is used to import the cluster:export CLUSTER_REQUEST=$(jq --null-input --arg openshift_cluster_id "$CLUSTER_ID" '{$ export CLUSTER_REQUEST=$(jq --null-input --arg openshift_cluster_id "$CLUSTER_ID" '{ "api_vip_dnsname": "<api_vip>",1 "openshift_cluster_id": $CLUSTER_ID, "name": "<openshift_cluster_name>"2 }')Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Replace
<api_vip>with the hostname for the cluster’s API server. This can be the DNS domain for the API server or the IP address of the single node which the host can reach. For example,api.compute-1.example.com. - 2
- Replace
<openshift_cluster_name>with the plain text name for the cluster. The cluster name should match the cluster name that was set during the Day 1 cluster installation.
Import the cluster and set the
$CLUSTER_IDvariable. Run the following command:CLUSTER_ID=$(curl "$API_URL/api/assisted-install/v2/clusters/import" -H "Authorization: Bearer ${API_TOKEN}" -H 'accept: application/json' -H 'Content-Type: application/json' \ -d "$CLUSTER_REQUEST" | tee /dev/stderr | jq -r '.id')$ CLUSTER_ID=$(curl "$API_URL/api/assisted-install/v2/clusters/import" -H "Authorization: Bearer ${API_TOKEN}" -H 'accept: application/json' -H 'Content-Type: application/json' \ -d "$CLUSTER_REQUEST" | tee /dev/stderr | jq -r '.id')Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Generate the
InfraEnvresource for the cluster and set the$INFRA_ENV_IDvariable by running the following commands:- Download the pull secret file from Red Hat OpenShift Cluster Manager at console.redhat.com.
Set the
$INFRA_ENV_REQUESTvariable:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Replace
<path_to_pull_secret_file>with the path to the local file containing the downloaded pull secret from Red Hat OpenShift Cluster Manager at console.redhat.com. - 2
- Replace
<path_to_ssh_pub_key>with the path to the public SSH key required to access the host. If you do not set this value, you cannot access the host while in discovery mode. - 3
- Replace
<infraenv_name>with the plain text name for theInfraEnvresource. - 4
- Replace
<iso_image_type>with the ISO image type, eitherfull-isoorminimal-iso.
Post the
$INFRA_ENV_REQUESTto the /v2/infra-envs API and set the$INFRA_ENV_IDvariable:INFRA_ENV_ID=$(curl "$API_URL/api/assisted-install/v2/infra-envs" -H "Authorization: Bearer ${API_TOKEN}" -H 'accept: application/json' -H 'Content-Type: application/json' -d "$INFRA_ENV_REQUEST" | tee /dev/stderr | jq -r '.id')$ INFRA_ENV_ID=$(curl "$API_URL/api/assisted-install/v2/infra-envs" -H "Authorization: Bearer ${API_TOKEN}" -H 'accept: application/json' -H 'Content-Type: application/json' -d "$INFRA_ENV_REQUEST" | tee /dev/stderr | jq -r '.id')Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Get the URL of the discovery ISO for the cluster host by running the following command:
curl -s "$API_URL/api/assisted-install/v2/infra-envs/$INFRA_ENV_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -r '.download_url'$ curl -s "$API_URL/api/assisted-install/v2/infra-envs/$INFRA_ENV_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -r '.download_url'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
https://api.openshift.com/api/assisted-images/images/41b91e72-c33e-42ee-b80f-b5c5bbf6431a?arch=x86_64&image_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NTYwMjYzNzEsInN1YiI6IjQxYjkxZTcyLWMzM2UtNDJlZS1iODBmLWI1YzViYmY2NDMxYSJ9.1EX_VGaMNejMhrAvVRBS7PDPIQtbOOc8LtG8OukE1a4&type=minimal-iso&version=4.12
https://api.openshift.com/api/assisted-images/images/41b91e72-c33e-42ee-b80f-b5c5bbf6431a?arch=x86_64&image_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NTYwMjYzNzEsInN1YiI6IjQxYjkxZTcyLWMzM2UtNDJlZS1iODBmLWI1YzViYmY2NDMxYSJ9.1EX_VGaMNejMhrAvVRBS7PDPIQtbOOc8LtG8OukE1a4&type=minimal-iso&version=4.12Copy to Clipboard Copied! Toggle word wrap Toggle overflow Download the ISO:
curl -L -s '<iso_url>' --output rhcos-live-minimal.iso
$ curl -L -s '<iso_url>' --output rhcos-live-minimal.iso1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Replace
<iso_url>with the URL for the ISO from the previous step.
-
Boot the new worker host from the downloaded
rhcos-live-minimal.iso. Get the list of hosts in the cluster that are not installed. Keep running the following command until the new host shows up:
curl -s "$API_URL/api/assisted-install/v2/clusters/$CLUSTER_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -r '.hosts[] | select(.status != "installed").id'$ curl -s "$API_URL/api/assisted-install/v2/clusters/$CLUSTER_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -r '.hosts[] | select(.status != "installed").id'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
2294ba03-c264-4f11-ac08-2f1bb2f8c296
2294ba03-c264-4f11-ac08-2f1bb2f8c296Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the
$HOST_IDvariable for the new host, for example:HOST_ID=<host_id>
$ HOST_ID=<host_id>1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Replace
<host_id>with the host ID from the previous step.
Check that the host is ready to install by running the following command:
NoteEnsure that you copy the entire command including the complete
jqexpression.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When the previous command shows that the host is ready, start the installation using the /v2/infra-envs/{infra_env_id}/hosts/{host_id}/actions/install API by running the following command:
curl -X POST -s "$API_URL/api/assisted-install/v2/infra-envs/$INFRA_ENV_ID/hosts/$HOST_ID/actions/install" -H "Authorization: Bearer ${API_TOKEN}"$ curl -X POST -s "$API_URL/api/assisted-install/v2/infra-envs/$INFRA_ENV_ID/hosts/$HOST_ID/actions/install" -H "Authorization: Bearer ${API_TOKEN}"Copy to Clipboard Copied! Toggle word wrap Toggle overflow As the installation proceeds, the installation generates pending certificate signing requests (CSRs) for the host.
ImportantYou must approve the CSRs to complete the installation.
Keep running the following API call to monitor the cluster installation:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Run the following command to see all the events for the cluster:
curl -s "$API_URL/api/assisted-install/v2/events?cluster_id=$CLUSTER_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -c '.[] | {severity, message, event_time, host_id}'$ curl -s "$API_URL/api/assisted-install/v2/events?cluster_id=$CLUSTER_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -c '.[] | {severity, message, event_time, host_id}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Log in to the cluster and approve the pending CSRs to complete the installation.
Verification
Check that the new host was successfully added to the cluster with a status of
Ready:oc get nodes
$ oc get nodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME STATUS ROLES AGE VERSION control-plane-1.example.com Ready master,worker 56m v1.25.0 compute-1.example.com Ready worker 11m v1.25.0
NAME STATUS ROLES AGE VERSION control-plane-1.example.com Ready master,worker 56m v1.25.0 compute-1.example.com Ready worker 11m v1.25.0Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.5. Installing a mixed-architecture cluster Link kopierenLink in die Zwischenablage kopiert!
From OpenShift Container Platform version 4.12.0 and later, a cluster with an x86_64 control plane can support mixed-architecture worker nodes of two different CPU architectures. Mixed-architecture clusters combine the strengths of each architecture and support a variety of workloads.
From version 4.12.0, you can add arm64 worker nodes to an existing OpenShift cluster with an x86_64 control plane. From version 4.14.0, you can add IBM Power or IBM zSystems worker nodes to an existing x86_64 control plane.
The main steps of the installation are as follows:
- Create and register a multi-architecture cluster.
-
Create an
x86_64infrastructure environment, download the ISO forx86_64, and add the control plane. The control plane must have thex86_64architecture. -
Create an
arm64,IBM PowerorIBM zSystemsinfrastructure environment, download the ISO forarm64,IBM PowerorIBM zSystems, and add the worker nodes.
These steps are detailed in the procedure below.
Supported platforms
The table below lists the platforms that support a mixed-architecture cluster for each OpenShift Container Platform version. Use the appropriate platforms for the version you are installing.
| OpenShift Container Platform version | Supported platforms | Day 1 control plane architecture | Day 2 node architecture |
|---|---|---|---|
| 4.12.0 |
|
|
|
| 4.13.0 |
|
|
|
| 4.14.0 |
|
|
|
Technology Preview (TP) features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Main steps
- Start the procedure for installing OpenShift Container Platform using the API. For details, see Installing with the Assisted Installer API in the Additional Resources section.
When you reach the "Registering a new cluster" step of the installation, register the cluster as a
multi-architecturecluster:Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteWhen you reach the "Registering a new infrastructure environment" step of the installation, set
cpu_architecturetox86_64:Copy to Clipboard Copied! Toggle word wrap Toggle overflow When you reach the "Adding hosts" step of the installation, set
host_roletomaster:NoteFor more information, see Assigning Roles to Hosts in Additional Resources.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Download the discovery image for the
x86_64architecture. -
Boot the
x86_64architecture hosts using the generated discovery image. - Start the installation and wait for the cluster to be fully installed.
Repeat the "Registering a new infrastructure environment" step of the installation. This time, set
cpu_architectureto one of the following:ppc64le(for IBM Power),s390x(for IBM Z), orarm64. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Repeat the "Adding hosts" step of the installation. This time, set
host_roletoworker:NoteFor more details, see Assigning Roles to Hosts in Additional Resources.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Download the discovery image for the
arm64,ppc64leors390xarchitecture. - Boot the architecture hosts using the generated discovery image.
- Start the installation and wait for the cluster to be fully installed.
Verification
View the
arm64,ppc64leors390xworker nodes in the cluster by running the following command:oc get nodes -o wide
$ oc get nodes -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow
12.6. Installing a primary control plane node on a healthy cluster Link kopierenLink in die Zwischenablage kopiert!
This procedure describes how to install a primary control plane node on a healthy OpenShift Container Platform cluster.
If the cluster is unhealthy, additional operations are required before they can be managed. See Additional Resources for more information.
Prerequisites
Procedure
Review and approve CSRs
Review the
CertificateSigningRequests(CSRs):oc get csr | grep Pending
$ oc get csr | grep PendingCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
csr-5sd59 8m19s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending csr-xzqts 10s kubernetes.io/kubelet-serving system:node:worker-6 <none> Pending
csr-5sd59 8m19s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending csr-xzqts 10s kubernetes.io/kubelet-serving system:node:worker-6 <none> PendingCopy to Clipboard Copied! Toggle word wrap Toggle overflow Approve all pending CSRs:
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approveCopy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantYou must approve the CSRs to complete the installation.
Confirm the primary node is in
Readystatus:oc get nodes
$ oc get nodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe
etcd-operatorrequires aMachineCustom Resources (CRs) referencing the new node when the cluster runs with a functional Machine API.Link the
MachineCR withBareMetalHostandNode:Create the
BareMetalHostCR with a unique.metadata.namevalue":Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc create -f <filename>
$ oc create -f <filename>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
BareMetalHostCR:oc apply -f <filename>
$ oc apply -f <filename>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
MachineCR using the unique.machine.namevalue:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc create -f <filename>
$ oc create -f <filename>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
MachineCR:oc apply -f <filename>
$ oc apply -f <filename>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Link
BareMetalHost,Machine, andNodeusing thelink-machine-and-node.shscript:Copy to Clipboard Copied! Toggle word wrap Toggle overflow bash link-machine-and-node.sh custom-master3 worker-5
$ bash link-machine-and-node.sh custom-master3 worker-5Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Confirm
etcdmembers:oc rsh -n openshift-etcd etcd-worker-2
$ oc rsh -n openshift-etcd etcd-worker-2 etcdctl member list -w tableCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the
etcd-operatorconfiguration applies to all nodes:oc get clusteroperator etcd
$ oc get clusteroperator etcdCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE etcd 4.11.5 True False False 5h54m
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE etcd 4.11.5 True False False 5h54mCopy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm
etcd-operatorhealth:oc rsh -n openshift-etcd etcd-worker-0
$ oc rsh -n openshift-etcd etcd-worker-0 etcdctl endpoint healthCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
192.168.111.26 is healthy: committed proposal: took = 11.297561ms 192.168.111.25 is healthy: committed proposal: took = 13.892416ms 192.168.111.28 is healthy: committed proposal: took = 11.870755ms
192.168.111.26 is healthy: committed proposal: took = 11.297561ms 192.168.111.25 is healthy: committed proposal: took = 13.892416ms 192.168.111.28 is healthy: committed proposal: took = 11.870755msCopy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm node health:
oc get Nodes
$ oc get NodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the
ClusterOperatorshealth:oc get ClusterOperators
$ oc get ClusterOperatorsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the
ClusterVersion:oc get ClusterVersion
$ oc get ClusterVersionCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.5 True False 5h57m Cluster version is 4.11.5
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.5 True False 5h57m Cluster version is 4.11.5Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the old control plane node:
Delete the
BareMetalHostCR:oc delete bmh -n openshift-machine-api custom-master3
$ oc delete bmh -n openshift-machine-api custom-master3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the
Machineis unhealthy:oc get machine -A
$ oc get machine -ACopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the
MachineCR:oc delete machine -n openshift-machine-api test-day2-1-6qv96-master-0
$ oc delete machine -n openshift-machine-api test-day2-1-6qv96-master-0 machine.machine.openshift.io "test-day2-1-6qv96-master-0" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm removal of the
NodeCR:oc get nodes
$ oc get nodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Check
etcd-operatorlogs to confirm status of theetcdcluster:oc logs -n openshift-etcd-operator etcd-operator-8668df65d-lvpjf
$ oc logs -n openshift-etcd-operator etcd-operator-8668df65d-lvpjfCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
E0927 07:53:10.597523 1 base_controller.go:272] ClusterMemberRemovalController reconciliation failed: cannot remove member: 192.168.111.23 because it is reported as healthy but it doesn't have a machine nor a node resource
E0927 07:53:10.597523 1 base_controller.go:272] ClusterMemberRemovalController reconciliation failed: cannot remove member: 192.168.111.23 because it is reported as healthy but it doesn't have a machine nor a node resourceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the physical machine to allow
etcd-operatorto reconcile the cluster members:oc rsh -n openshift-etcd etcd-worker-2
$ oc rsh -n openshift-etcd etcd-worker-2 etcdctl member list -w table; etcdctl endpoint healthCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.7. Installing a primary control plane node on an unhealthy cluster Link kopierenLink in die Zwischenablage kopiert!
This procedure describes how to install a primary control plane node on an unhealthy OpenShift Container Platform cluster.
Prerequisites
Procedure
Confirm initial state of the cluster:
oc get nodes
$ oc get nodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the
etcd-operatordetects the cluster as unhealthy:oc logs -n openshift-etcd-operator etcd-operator-8668df65d-lvpjf
$ oc logs -n openshift-etcd-operator etcd-operator-8668df65d-lvpjfCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
E0927 08:24:23.983733 1 base_controller.go:272] DefragController reconciliation failed: cluster is unhealthy: 2 of 3 members are available, worker-2 is unhealthy
E0927 08:24:23.983733 1 base_controller.go:272] DefragController reconciliation failed: cluster is unhealthy: 2 of 3 members are available, worker-2 is unhealthyCopy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the
etcdctlmembers:oc rsh -n openshift-etcd etcd-worker-3
$ oc rsh -n openshift-etcd etcd-worker-3 etcdctl member list -w tableCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm that
etcdctlreports an unhealthy member of the cluster:etcdctl endpoint health
$ etcdctl endpoint healthCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"level":"warn","ts":"2022-09-27T08:25:35.953Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000680380/192.168.111.25","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.111.25: connect: no route to host\""} 192.168.111.28 is healthy: committed proposal: took = 12.465641ms 192.168.111.26 is healthy: committed proposal: took = 12.297059ms 192.168.111.25 is unhealthy: failed to commit proposal: context deadline exceeded Error: unhealthy cluster{"level":"warn","ts":"2022-09-27T08:25:35.953Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000680380/192.168.111.25","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.111.25: connect: no route to host\""} 192.168.111.28 is healthy: committed proposal: took = 12.465641ms 192.168.111.26 is healthy: committed proposal: took = 12.297059ms 192.168.111.25 is unhealthy: failed to commit proposal: context deadline exceeded Error: unhealthy clusterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the unhealthy control plane by deleting the
MachineCustom Resource:oc delete machine -n openshift-machine-api test-day2-1-6qv96-master-2
$ oc delete machine -n openshift-machine-api test-day2-1-6qv96-master-2Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe
MachineandNodeCustom Resources (CRs) will not be deleted if the unhealthy cluster cannot run successfully.Confirm that
etcd-operatorhas not removed the unhealthy machine:oc logs -n openshift-etcd-operator etcd-operator-8668df65d-lvpjf -f
$ oc logs -n openshift-etcd-operator etcd-operator-8668df65d-lvpjf -fCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
I0927 08:58:41.249222 1 machinedeletionhooks.go:135] skip removing the deletion hook from machine test-day2-1-6qv96-master-2 since its member is still present with any of: [{InternalIP } {InternalIP 192.168.111.26}]I0927 08:58:41.249222 1 machinedeletionhooks.go:135] skip removing the deletion hook from machine test-day2-1-6qv96-master-2 since its member is still present with any of: [{InternalIP } {InternalIP 192.168.111.26}]Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the unhealthy
etcdctlmember manually:oc rsh -n openshift-etcd etcd-worker-3\ etcdctl member list -w table
$ oc rsh -n openshift-etcd etcd-worker-3\ etcdctl member list -w tableCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm that
etcdctlreports an unhealthy member of the cluster:etcdctl endpoint health
$ etcdctl endpoint healthCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"level":"warn","ts":"2022-09-27T10:31:07.227Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000d6e00/192.168.111.25","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.111.25: connect: no route to host\""} 192.168.111.28 is healthy: committed proposal: took = 13.038278ms 192.168.111.26 is healthy: committed proposal: took = 12.950355ms 192.168.111.25 is unhealthy: failed to commit proposal: context deadline exceeded Error: unhealthy cluster{"level":"warn","ts":"2022-09-27T10:31:07.227Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000d6e00/192.168.111.25","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.111.25: connect: no route to host\""} 192.168.111.28 is healthy: committed proposal: took = 13.038278ms 192.168.111.26 is healthy: committed proposal: took = 12.950355ms 192.168.111.25 is unhealthy: failed to commit proposal: context deadline exceeded Error: unhealthy clusterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the unhealthy cluster by deleting the
etcdctlmember Custom Resource:etcdctl member remove 61e2a86084aafa62
$ etcdctl member remove 61e2a86084aafa62Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Member 61e2a86084aafa62 removed from cluster 6881c977b97990d7
Member 61e2a86084aafa62 removed from cluster 6881c977b97990d7Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm members of
etcdctlby running the following command:etcdctl member list -w table
$ etcdctl member list -w tableCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Review and approve Certificate Signing Requests
Review the Certificate Signing Requests (CSRs):
oc get csr | grep Pending
$ oc get csr | grep PendingCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
csr-5sd59 8m19s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending csr-xzqts 10s kubernetes.io/kubelet-serving system:node:worker-6 <none> Pending
csr-5sd59 8m19s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending csr-xzqts 10s kubernetes.io/kubelet-serving system:node:worker-6 <none> PendingCopy to Clipboard Copied! Toggle word wrap Toggle overflow Approve all pending CSRs:
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approveCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteYou must approve the CSRs to complete the installation.
Confirm ready status of the control plane node:
oc get nodes
$ oc get nodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Validate the
Machine,NodeandBareMetalHostCustom Resources.The
etcd-operatorrequiresMachineCRs to be present if the cluster is running with the functional Machine API.MachineCRs are displayed during theRunningphase when present.Create
MachineCustom Resource linked withBareMetalHostandNode.Make sure there is a
MachineCR referencing the newly added node.ImportantBoot-it-yourself will not create
BareMetalHostandMachineCRs, so you must create them. Failure to create theBareMetalHostandMachineCRs will generate errors when runningetcd-operator.Add
BareMetalHostCustom Resource:oc create bmh -n openshift-machine-api custom-master3
$ oc create bmh -n openshift-machine-api custom-master3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add
MachineCustom Resource:oc create machine -n openshift-machine-api custom-master3
$ oc create machine -n openshift-machine-api custom-master3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Link
BareMetalHost,Machine, andNodeby running thelink-machine-and-node.shscript:Copy to Clipboard Copied! Toggle word wrap Toggle overflow bash link-machine-and-node.sh custom-master3 worker-3
$ bash link-machine-and-node.sh custom-master3 worker-3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm members of
etcdctlby running the following command:oc rsh -n openshift-etcd etcd-worker-3
$ oc rsh -n openshift-etcd etcd-worker-3 etcdctl member list -w tableCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the
etcdoperator has configured all nodes:oc get clusteroperator etcd
$ oc get clusteroperator etcdCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE etcd 4.11.5 True False False 22h
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE etcd 4.11.5 True False False 22hCopy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm health of
etcdctl:oc rsh -n openshift-etcd etcd-worker-3
$ oc rsh -n openshift-etcd etcd-worker-3 etcdctl endpoint healthCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
192.168.111.26 is healthy: committed proposal: took = 9.105375ms 192.168.111.28 is healthy: committed proposal: took = 9.15205ms 192.168.111.29 is healthy: committed proposal: took = 10.277577ms
192.168.111.26 is healthy: committed proposal: took = 9.105375ms 192.168.111.28 is healthy: committed proposal: took = 9.15205ms 192.168.111.29 is healthy: committed proposal: took = 10.277577msCopy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the health of the nodes:
oc get Nodes
$ oc get NodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the health of the
ClusterOperators:oc get ClusterOperators
$ oc get ClusterOperatorsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the
ClusterVersion:oc get ClusterVersion
$ oc get ClusterVersionCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.5 True False 22h Cluster version is 4.11.5
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.5 True False 22h Cluster version is 4.11.5Copy to Clipboard Copied! Toggle word wrap Toggle overflow