Home
Products
Assisted Installer for OpenShift Container Platform
2025
Installing OpenShift Container Platform with the Assisted Installer
Chapter 11. Expanding the cluster

Chapter 11. Expanding the cluster

You can expand a cluster installed with the Assisted Installer by adding hosts using the user interface or the API.

11.1. Checking for multi-architecture support
Copy link

You must check that your cluster can support multiple architectures before you add a node with a different architecture.

Procedure

Log in to the cluster using the CLI.
Check that your cluster uses the architecture payload by running the following command:
```
oc adm release info -o json | jq .metadata.metadata
```
```
$ oc adm release info -o json | jq .metadata.metadata
```
Copy to Clipboard Toggle word wrap

Verification

If you see the following output, your cluster supports multiple architectures:
```
{
  "release.openshift.io/architecture": "multi"
}
```
```
{
  "release.openshift.io/architecture": "multi"
}
```
Copy to Clipboard Toggle word wrap

11.2. Installing multi-architecture compute clusters
Copy link

A cluster with an x86_64 or arm64 control plane can support worker nodes that have two different CPU architectures. Multi-architecture clusters combine the strengths of each architecture and support a variety of workloads.

For example, you can add arm64, IBM Power® (ppc64le), or IBM Z® (s390x) worker nodes to an existing OpenShift Container Platform cluster with an x86_64.

The main steps of the installation are as follows:

Create and register a multi-architecture compute cluster.
Create an x86_64 or arm64 infrastructure environment, download the ISO discovery image for the environment, and add the control plane. An arm64 infrastructure environment is available for Amazon Web Services (AWS) and Google Cloud (GC) only.
Create an arm64, ppc64le, or s390x infrastructure environment, download the ISO discovery images for arm64, ppc64le, or s390x, and add the worker nodes.

Supported platforms

For the supported platforms for each OpenShift Container Platform version, see About clusters with multi-architecture compute machines. Use the appropriate platforms for the version you are installing.

Main steps

Start the procedure for installing OpenShift Container Platform using the API. For details, see Installing with the Assisted Installer API in the Additional Resources section.

When you reach the "Registering a new cluster" step of the installation, register the cluster as a multi-architecture compute cluster:

curl -s -X POST https://api.openshift.com/api/assisted-install/v2/clusters \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d "$(jq --null-input \
   --slurpfile pull_secret ~/Downloads/pull-secret.txt '

$ curl -s -X POST https://api.openshift.com/api/assisted-install/v2/clusters \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d "$(jq --null-input \
   --slurpfile pull_secret ~/Downloads/pull-secret.txt '
{
   "name": "testcluster",
   "openshift_version": "<version-number>-multi",


   "cpu_architecture" : "multi"


   "control_plane_count": "<number>"


   "base_dns_domain": "example.com",
   "pull_secret": $pull_secret[0] | tojson
}
')" | jq '.id'

Copy to Clipboard

Toggle word wrap

Note

1: Use the multi- option for the OpenShift Container Platform version number; for example, "4.19-multi".
2: Set the CPU architecture to "multi".
3: Set the number of control plane nodes to "3", "4", or "5". The option of 4 or 5 control plane nodes is available from OpenShift Container Platform 4.18 and later. Single-node OpenShift is not supported for a multi-architecture compute cluster. The control_plane_count field replaces high_availability_mode, which is deprecated.

When you reach the "Registering a new infrastructure environment" step of the installation, set cpu_architecture to x86_64:

curl https://api.openshift.com/api/assisted-install/v2/infra-envs \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d "$(jq --null-input \
 --slurpfile pull_secret ~/Downloads/pull-secret.txt \
 --arg cluster_id ${CLUSTER_ID} '

$ curl https://api.openshift.com/api/assisted-install/v2/infra-envs \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d "$(jq --null-input \
 --slurpfile pull_secret ~/Downloads/pull-secret.txt \
 --arg cluster_id ${CLUSTER_ID} '
   {
     "name": "testcluster-infra-env",
     "image_type":"full-iso",
     "cluster_id": $cluster_id,
     "cpu_architecture" : "x86_64"
     "pull_secret": $pull_secret[0] | tojson
   }
')" | jq '.id'

Copy to Clipboard

Toggle word wrap

When you reach the "Adding hosts" step of the installation, set host_role to master:

Note

For more information, see Assigning Roles to Hosts in Additional Resources.

curl https://api.openshift.com/api/assisted-install/v2/infra-envs/${INFRA_ENV_ID}/hosts/<host_id> \
-X PATCH \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d '

$ curl https://api.openshift.com/api/assisted-install/v2/infra-envs/${INFRA_ENV_ID}/hosts/<host_id> \
-X PATCH \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d '
   {
     "host_role":"master"
   }
' | jq

Copy to Clipboard

Toggle word wrap

Download the discovery image for the x86_64 architecture.
Boot the x86_64 architecture hosts using the generated discovery image.
Start the installation and wait for the cluster to be fully installed.

Repeat the "Registering a new infrastructure environment" step of the installation. This time, set cpu_architecture to one of the following: ppc64le (for IBM Power®), s390x (for IBM Z®), or arm64. For example:

curl -s -X POST https://api.openshift.com/api/assisted-install/v2/clusters \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d "$(jq --null-input \
   --slurpfile pull_secret ~/Downloads/pull-secret.txt '

$ curl -s -X POST https://api.openshift.com/api/assisted-install/v2/clusters \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d "$(jq --null-input \
   --slurpfile pull_secret ~/Downloads/pull-secret.txt '
{
   "name": "testcluster",
   "openshift_version": "4.12",
   "cpu_architecture" : "arm64"
   "control_plane_count": "3"
   "base_dns_domain": "example.com",
   "pull_secret": $pull_secret[0] | tojson
}
')" | jq '.id'

Copy to Clipboard

Toggle word wrap

Repeat the "Adding hosts" step of the installation. This time, set host_role to worker:

Note

For more details, see Assigning Roles to Hosts in Additional Resources.

curl https://api.openshift.com/api/assisted-install/v2/infra-envs/${INFRA_ENV_ID}/hosts/<host_id> \
-X PATCH \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d '

$ curl https://api.openshift.com/api/assisted-install/v2/infra-envs/${INFRA_ENV_ID}/hosts/<host_id> \
-X PATCH \
-H "Authorization: Bearer ${API_TOKEN}" \
-H "Content-Type: application/json" \
-d '
   {
     "host_role":"worker"
   }
' | jq

Copy to Clipboard

Toggle word wrap

Download the discovery image for the arm64, ppc64le or s390x architecture.
Boot the architecture hosts using the generated discovery image.
Start the installation and wait for the cluster to be fully installed.

Verification

View the arm64, ppc64le, or s390x worker nodes in the cluster by running the following command:
```
oc get nodes -o wide
```
```
$ oc get nodes -o wide
```
Copy to Clipboard Toggle word wrap

11.3. Adding hosts with the web console
Copy link

You can add hosts to clusters that were created using the Assisted Installer.

Important

Adding hosts to Assisted Installer clusters is only supported for clusters running OpenShift Container Platform version 4.11 and later.
When adding a control plane node during Day 2 operations, ensure that the new node shares the same subnet as the Day 1 network. The subnet is specified in the machineNetwork field of the install-config.yaml file. This requirement applies to cluster-managed networks such as bare metal or vSphere, and not to user-managed networks.

Procedure

Log in to OpenShift Cluster Manager and click the cluster that you want to expand.
Click Add hosts and download the discovery ISO for the new host, adding an SSH public key and configuring cluster-wide proxy settings as needed.
Optional: Modify ignition files as needed.
Boot the target host using the discovery ISO, and wait for the host to be discovered in the console.
Select the host role. It can be either a worker or a control plane host.
Start the installation.
As the installation proceeds, the installation generates pending certificate signing requests (CSRs) for the host. When prompted, approve the pending CSRs to complete the installation.
When the host is successfully installed, it is listed as a host in the cluster web console.

Important

New hosts will be encrypted using the same method as the original cluster.

11.4. Adding hosts with the API
Copy link

You can add hosts to clusters using the Assisted Installer REST API.

Prerequisites

Install the Red Hat OpenShift Cluster Manager CLI (ocm).
Log in to Red Hat OpenShift Cluster Manager as a user with cluster creation privileges.
Install jq.
Ensure that all the required DNS records exist for the cluster that you want to expand.

Important

When adding a control plane node during Day 2 operations, ensure that the new node shares the same subnet as the Day 1 network. The subnet is specified in the machineNetwork field of the install-config.yaml file. This requirement applies to cluster-managed networks such as bare metal or vSphere, and not to user-managed networks.

Procedure

Authenticate against the Assisted Installer REST API and generate an API token for your session. The generated token is valid for 15 minutes only.
Set the $API_URL variable by running the following command:
```
export API_URL=<api_url>
```
```
$ export API_URL=<api_url> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <api_url> with the Assisted Installer API URL, for example, https://api.openshift.com

Import the cluster by running the following commands:

Set the $CLUSTER_ID variable:
1. Log in to the cluster and run the following command:
  $ export CLUSTER_ID=$(oc get clusterversion -o jsonpath='{.items[].spec.clusterID}')
  Copy to Clipboard Toggle word wrap
2. Display the $CLUSTER_ID variable output:
  $ echo ${CLUSTER_ID}
  Copy to Clipboard Toggle word wrap
Set the $CLUSTER_REQUEST variable that is used to import the cluster:
```
export CLUSTER_REQUEST=$(jq --null-input --arg openshift_cluster_id "$CLUSTER_ID" \
'{
```
```
$ export CLUSTER_REQUEST=$(jq --null-input --arg openshift_cluster_id "$CLUSTER_ID" \
'{
  "api_vip_dnsname": "<api_vip>", 
```
1
```
  "openshift_cluster_id": "<cluster_id>", 
```
2
```
  "name": "<openshift_cluster_name>" 
```
3
```
}')
```
Copy to Clipboard Toggle word wrap
1
Replace <api_vip> with the hostname for the cluster’s API server. This can be the DNS domain for the API server or the IP address of the single node which the host can reach. For example, api.compute-1.example.com.
2
Replace <cluster_id> with the $CLUSTER_ID output from the previous substep.
3
Replace <openshift_cluster_name> with the plain text name for the cluster. The cluster name should match the cluster name that was set during the Day 1 cluster installation.

Import the cluster and set the $CLUSTER_ID variable. Run the following command:

CLUSTER_ID=$(curl "$API_URL/api/assisted-install/v2/clusters/import" -H "Authorization: Bearer ${API_TOKEN}" -H 'accept: application/json' -H 'Content-Type: application/json' \
  -d "$CLUSTER_REQUEST" | tee /dev/stderr | jq -r '.id')

$ CLUSTER_ID=$(curl "$API_URL/api/assisted-install/v2/clusters/import" -H "Authorization: Bearer ${API_TOKEN}" -H 'accept: application/json' -H 'Content-Type: application/json' \
  -d "$CLUSTER_REQUEST" | tee /dev/stderr | jq -r '.id')

Copy to Clipboard

Toggle word wrap

Generate the InfraEnv resource for the cluster and set the $INFRA_ENV_ID variable by running the following commands:

Download the pull secret file from Red Hat OpenShift Cluster Manager at console.redhat.com.

Set the $INFRA_ENV_REQUEST variable:

export INFRA_ENV_REQUEST=$(jq --null-input \
    --slurpfile pull_secret <path_to_pull_secret_file> \
    --arg ssh_pub_key "$(cat <path_to_ssh_pub_key>)" \
    --arg cluster_id "$CLUSTER_ID" '{
  "name": "<infraenv_name>", 
  "pull_secret": $pull_secret[0] | tojson,
  "cluster_id": $cluster_id,
  "ssh_authorized_key": $ssh_pub_key,
  "image_type": "<iso_image_type>" 
}')

export INFRA_ENV_REQUEST=$(jq --null-input \
    --slurpfile pull_secret <path_to_pull_secret_file> \


    --arg ssh_pub_key "$(cat <path_to_ssh_pub_key>)" \


    --arg cluster_id "$CLUSTER_ID" '{
  "name": "<infraenv_name>",


  "pull_secret": $pull_secret[0] | tojson,
  "cluster_id": $cluster_id,
  "ssh_authorized_key": $ssh_pub_key,
  "image_type": "<iso_image_type>"

}')

Copy to Clipboard

Toggle word wrap

1: Replace <path_to_pull_secret_file> with the path to the local file containing the downloaded pull secret from Red Hat OpenShift Cluster Manager at console.redhat.com.
2: Replace <path_to_ssh_pub_key> with the path to the public SSH key required to access the host. If you do not set this value, you cannot access the host while in discovery mode.
3: Replace <infraenv_name> with the plain text name for the InfraEnv resource.
4: Replace <iso_image_type> with the ISO image type, either full-iso or minimal-iso.

Post the $INFRA_ENV_REQUEST to the /v2/infra-envs API and set the $INFRA_ENV_ID variable:

INFRA_ENV_ID=$(curl "$API_URL/api/assisted-install/v2/infra-envs" -H "Authorization: Bearer ${API_TOKEN}" -H 'accept: application/json' -H 'Content-Type: application/json' -d "$INFRA_ENV_REQUEST" | tee /dev/stderr | jq -r '.id')

$ INFRA_ENV_ID=$(curl "$API_URL/api/assisted-install/v2/infra-envs" -H "Authorization: Bearer ${API_TOKEN}" -H 'accept: application/json' -H 'Content-Type: application/json' -d "$INFRA_ENV_REQUEST" | tee /dev/stderr | jq -r '.id')

Copy to Clipboard

Toggle word wrap

Get the URL of the discovery ISO for the cluster host by running the following command:

curl -s "$API_URL/api/assisted-install/v2/infra-envs/$INFRA_ENV_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -r '.download_url'

$ curl -s "$API_URL/api/assisted-install/v2/infra-envs/$INFRA_ENV_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -r '.download_url'

Copy to Clipboard

Toggle word wrap

Example output

https://api.openshift.com/api/assisted-images/images/41b91e72-c33e-42ee-b80f-b5c5bbf6431a?arch=x86_64&image_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NTYwMjYzNzEsInN1YiI6IjQxYjkxZTcyLWMzM2UtNDJlZS1iODBmLWI1YzViYmY2NDMxYSJ9.1EX_VGaMNejMhrAvVRBS7PDPIQtbOOc8LtG8OukE1a4&type=minimal-iso&version=4.12

https://api.openshift.com/api/assisted-images/images/41b91e72-c33e-42ee-b80f-b5c5bbf6431a?arch=x86_64&image_token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2NTYwMjYzNzEsInN1YiI6IjQxYjkxZTcyLWMzM2UtNDJlZS1iODBmLWI1YzViYmY2NDMxYSJ9.1EX_VGaMNejMhrAvVRBS7PDPIQtbOOc8LtG8OukE1a4&type=minimal-iso&version=4.12

Copy to Clipboard

Toggle word wrap

Download the ISO:
```
curl -L -s '<iso_url>' --output rhcos-live-minimal.iso
```
```
$ curl -L -s '<iso_url>' --output rhcos-live-minimal.iso 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <iso_url> with the URL for the ISO from the previous step.
Boot the new worker host from the downloaded rhcos-live-minimal.iso.

Get the list of hosts in the cluster that are not installed. Keep running the following command until the new host shows up:

curl -s "$API_URL/api/assisted-install/v2/clusters/$CLUSTER_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -r '.hosts[] | select(.status != "installed").id'

$ curl -s "$API_URL/api/assisted-install/v2/clusters/$CLUSTER_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -r '.hosts[] | select(.status != "installed").id'

Copy to Clipboard

Toggle word wrap

Example output

2294ba03-c264-4f11-ac08-2f1bb2f8c296

2294ba03-c264-4f11-ac08-2f1bb2f8c296

Copy to Clipboard

Toggle word wrap

Set the $HOST_ID variable for the new host, for example:
```
HOST_ID=<host_id>
```
```
$ HOST_ID=<host_id> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <host_id> with the host ID from the previous step.

Check that the host is ready to install by running the following command:

Note

Ensure that you copy the entire command including the complete jq expression.

curl -s $API_URL/api/assisted-install/v2/clusters/$CLUSTER_ID -H "Authorization: Bearer ${API_TOKEN}" | jq '

$ curl -s $API_URL/api/assisted-install/v2/clusters/$CLUSTER_ID -H "Authorization: Bearer ${API_TOKEN}" | jq '
def host_name($host):
    if (.suggested_hostname // "") == "" then
        if (.inventory // "") == "" then
            "Unknown hostname, please wait"
        else
            .inventory | fromjson | .hostname
        end
    else
        .suggested_hostname
    end;

def is_notable($validation):
    ["failure", "pending", "error"] | any(. == $validation.status);

def notable_validations($validations_info):
    [
        $validations_info // "{}"
        | fromjson
        | to_entries[].value[]
        | select(is_notable(.))
    ];

{
    "Hosts validations": {
        "Hosts": [
            .hosts[]
            | select(.status != "installed")
            | {
                "id": .id,
                "name": host_name(.),
                "status": .status,
                "notable_validations": notable_validations(.validations_info)
            }
        ]
    },
    "Cluster validations info": {
        "notable_validations": notable_validations(.validations_info)
    }
}
' -r

Copy to Clipboard

Toggle word wrap

Example output

{
  "Hosts validations": {
    "Hosts": [
      {
        "id": "97ec378c-3568-460c-bc22-df54534ff08f",
        "name": "localhost.localdomain",
        "status": "insufficient",
        "notable_validations": [
          {
            "id": "ntp-synced",
            "status": "failure",
            "message": "Host couldn't synchronize with any NTP server"
          },
          {
            "id": "api-domain-name-resolved-correctly",
            "status": "error",
            "message": "Parse error for domain name resolutions result"
          },
          {
            "id": "api-int-domain-name-resolved-correctly",
            "status": "error",
            "message": "Parse error for domain name resolutions result"
          },
          {
            "id": "apps-domain-name-resolved-correctly",
            "status": "error",
            "message": "Parse error for domain name resolutions result"
          }
        ]
      }
    ]
  },
  "Cluster validations info": {
    "notable_validations": []
  }
}

{
  "Hosts validations": {
    "Hosts": [
      {
        "id": "97ec378c-3568-460c-bc22-df54534ff08f",
        "name": "localhost.localdomain",
        "status": "insufficient",
        "notable_validations": [
          {
            "id": "ntp-synced",
            "status": "failure",
            "message": "Host couldn't synchronize with any NTP server"
          },
          {
            "id": "api-domain-name-resolved-correctly",
            "status": "error",
            "message": "Parse error for domain name resolutions result"
          },
          {
            "id": "api-int-domain-name-resolved-correctly",
            "status": "error",
            "message": "Parse error for domain name resolutions result"
          },
          {
            "id": "apps-domain-name-resolved-correctly",
            "status": "error",
            "message": "Parse error for domain name resolutions result"
          }
        ]
      }
    ]
  },
  "Cluster validations info": {
    "notable_validations": []
  }
}

Copy to Clipboard

Toggle word wrap

When the previous command shows that the host is ready, start the installation using the /v2/infra-envs/{infra_env_id}/hosts/{host_id}/actions/install API by running the following command:

curl -X POST -s "$API_URL/api/assisted-install/v2/infra-envs/$INFRA_ENV_ID/hosts/$HOST_ID/actions/install"  -H "Authorization: Bearer ${API_TOKEN}"

$ curl -X POST -s "$API_URL/api/assisted-install/v2/infra-envs/$INFRA_ENV_ID/hosts/$HOST_ID/actions/install"  -H "Authorization: Bearer ${API_TOKEN}"

Copy to Clipboard

Toggle word wrap

As the installation proceeds, the installation generates pending certificate signing requests (CSRs) for the host.

Important

You must approve the CSRs to complete the installation.

Keep running the following API call to monitor the cluster installation:

curl -s "$API_URL/api/assisted-install/v2/clusters/$CLUSTER_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq '{

$ curl -s "$API_URL/api/assisted-install/v2/clusters/$CLUSTER_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq '{
    "Cluster day-2 hosts":
        [
            .hosts[]
            | select(.status != "installed")
            | {id, requested_hostname, status, status_info, progress, status_updated_at, updated_at, infra_env_id, cluster_id, created_at}
        ]
}'

Copy to Clipboard

Toggle word wrap

Example output

{
  "Cluster day-2 hosts": [
    {
      "id": "a1c52dde-3432-4f59-b2ae-0a530c851480",
      "requested_hostname": "control-plane-1",
      "status": "added-to-existing-cluster",
      "status_info": "Host has rebooted and no further updates will be posted. Please check console for progress and to possibly approve pending CSRs",
      "progress": {
        "current_stage": "Done",
        "installation_percentage": 100,
        "stage_started_at": "2022-07-08T10:56:20.476Z",
        "stage_updated_at": "2022-07-08T10:56:20.476Z"
      },
      "status_updated_at": "2022-07-08T10:56:20.476Z",
      "updated_at": "2022-07-08T10:57:15.306369Z",
      "infra_env_id": "b74ec0c3-d5b5-4717-a866-5b6854791bd3",
      "cluster_id": "8f721322-419d-4eed-aa5b-61b50ea586ae",
      "created_at": "2022-07-06T22:54:57.161614Z"
    }
  ]
}

{
  "Cluster day-2 hosts": [
    {
      "id": "a1c52dde-3432-4f59-b2ae-0a530c851480",
      "requested_hostname": "control-plane-1",
      "status": "added-to-existing-cluster",
      "status_info": "Host has rebooted and no further updates will be posted. Please check console for progress and to possibly approve pending CSRs",
      "progress": {
        "current_stage": "Done",
        "installation_percentage": 100,
        "stage_started_at": "2022-07-08T10:56:20.476Z",
        "stage_updated_at": "2022-07-08T10:56:20.476Z"
      },
      "status_updated_at": "2022-07-08T10:56:20.476Z",
      "updated_at": "2022-07-08T10:57:15.306369Z",
      "infra_env_id": "b74ec0c3-d5b5-4717-a866-5b6854791bd3",
      "cluster_id": "8f721322-419d-4eed-aa5b-61b50ea586ae",
      "created_at": "2022-07-06T22:54:57.161614Z"
    }
  ]
}

Copy to Clipboard

Toggle word wrap

Optional: Run the following command to see all the events for the cluster:

curl -s "$API_URL/api/assisted-install/v2/events?cluster_id=$CLUSTER_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -c '.[] | {severity, message, event_time, host_id}'

$ curl -s "$API_URL/api/assisted-install/v2/events?cluster_id=$CLUSTER_ID" -H "Authorization: Bearer ${API_TOKEN}" | jq -c '.[] | {severity, message, event_time, host_id}'

Copy to Clipboard

Toggle word wrap

Example output

{"severity":"info","message":"Host compute-0: updated status from insufficient to known (Host is ready to be installed)","event_time":"2022-07-08T11:21:46.346Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}
{"severity":"info","message":"Host compute-0: updated status from known to installing (Installation is in progress)","event_time":"2022-07-08T11:28:28.647Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}
{"severity":"info","message":"Host compute-0: updated status from installing to installing-in-progress (Starting installation)","event_time":"2022-07-08T11:28:52.068Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}
{"severity":"info","message":"Uploaded logs for host compute-0 cluster 8f721322-419d-4eed-aa5b-61b50ea586ae","event_time":"2022-07-08T11:29:47.802Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}
{"severity":"info","message":"Host compute-0: updated status from installing-in-progress to added-to-existing-cluster (Host has rebooted and no further updates will be posted. Please check console for progress and to possibly approve pending CSRs)","event_time":"2022-07-08T11:29:48.259Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}
{"severity":"info","message":"Host: compute-0, reached installation stage Rebooting","event_time":"2022-07-08T11:29:48.261Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}

{"severity":"info","message":"Host compute-0: updated status from insufficient to known (Host is ready to be installed)","event_time":"2022-07-08T11:21:46.346Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}
{"severity":"info","message":"Host compute-0: updated status from known to installing (Installation is in progress)","event_time":"2022-07-08T11:28:28.647Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}
{"severity":"info","message":"Host compute-0: updated status from installing to installing-in-progress (Starting installation)","event_time":"2022-07-08T11:28:52.068Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}
{"severity":"info","message":"Uploaded logs for host compute-0 cluster 8f721322-419d-4eed-aa5b-61b50ea586ae","event_time":"2022-07-08T11:29:47.802Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}
{"severity":"info","message":"Host compute-0: updated status from installing-in-progress to added-to-existing-cluster (Host has rebooted and no further updates will be posted. Please check console for progress and to possibly approve pending CSRs)","event_time":"2022-07-08T11:29:48.259Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}
{"severity":"info","message":"Host: compute-0, reached installation stage Rebooting","event_time":"2022-07-08T11:29:48.261Z","host_id":"9d7b3b44-1125-4ad0-9b14-76550087b445"}

Copy to Clipboard

Toggle word wrap

Verification

Check that the new host was successfully added to the cluster with a status of Ready:

oc get nodes

$ oc get nodes

Copy to Clipboard

Toggle word wrap

Example output

NAME                           STATUS   ROLES           AGE   VERSION
control-plane-1.example.com    Ready    master,worker   56m   v1.25.0
compute-1.example.com          Ready    worker          11m   v1.25.0

NAME                           STATUS   ROLES           AGE   VERSION
control-plane-1.example.com    Ready    master,worker   56m   v1.25.0
compute-1.example.com          Ready    worker          11m   v1.25.0

Copy to Clipboard

Toggle word wrap

11.5. Replacing a control plane node in a healthy cluster
Copy link

You can replace a control plane (master) node in a healthy OpenShift Container Platform cluster that has three to five control plane nodes, by adding a new control plane node and removing an existing control plane node.

If the cluster is unhealthy, you must peform additional operations before you can manage the control plane nodes. See Replacing a control plane node in an unhealthy cluster for more information.

11.5.1. Adding a new control plane node
Copy link

Add the new control plane node, and verify that it is healthy. In the example below, the new node is node-5.

Prerequisites

You are using OpenShift Container Platform 4.11 or later.
You have installed a healthy cluster with at least three control plane nodes.
You have created a single control plane node to be added to the cluster for Day 2.

Procedure

Retrieve pending Certificate Signing Requests (CSRs) for the new Day 2 control plane node:

oc get csr | grep Pending

$ oc get csr | grep Pending

Copy to Clipboard

Toggle word wrap

Example output

csr-5sd59   8m19s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
csr-xzqts   10s     kubernetes.io/kubelet-serving                 system:node:node-5                                                   <none>              Pending

csr-5sd59   8m19s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
csr-xzqts   10s     kubernetes.io/kubelet-serving                 system:node:node-5                                                   <none>              Pending

Copy to Clipboard

Toggle word wrap

Approve all pending CSRs for the new node (node-5 in this example):

oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve

$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve

Copy to Clipboard

Toggle word wrap

Important

You must approve the CSRs to complete the installation.

Confirm that the new control plane node is in Ready status:

oc get nodes

$ oc get nodes

Copy to Clipboard

Toggle word wrap

Example output

NAME       STATUS   ROLES    AGE     VERSION
node-0   Ready    master   4h42m   v1.24.0+3882f8f
node-1   Ready    master   4h27m   v1.24.0+3882f8f
node-2   Ready    master   4h43m   v1.24.0+3882f8f
node-3   Ready    worker   4h29m   v1.24.0+3882f8f
node-4   Ready    worker   4h30m   v1.24.0+3882f8f
node-5   Ready    master   105s    v1.24.0+3882f8f

NAME       STATUS   ROLES    AGE     VERSION
node-0   Ready    master   4h42m   v1.24.0+3882f8f
node-1   Ready    master   4h27m   v1.24.0+3882f8f
node-2   Ready    master   4h43m   v1.24.0+3882f8f
node-3   Ready    worker   4h29m   v1.24.0+3882f8f
node-4   Ready    worker   4h30m   v1.24.0+3882f8f
node-5   Ready    master   105s    v1.24.0+3882f8f

Copy to Clipboard

Toggle word wrap

Note

The etcd operator requires a Machine custom resource (CR) that references the new node when the cluster runs with a Machine API. The machine API is automatically activated when the cluster has three or more control plane nodes.

Create the BareMetalHost and Machine CRs and link them to the new control plane’s Node CR.

Create the BareMetalHost CR with a unique .metadata.name value:

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: node-5
  namespace: openshift-machine-api
spec:
  automatedCleaningMode: metadata
  bootMACAddress: 00:00:00:00:00:02
  bootMode: UEFI
  customDeploy:
    method: install_coreos
  externallyProvisioned: true
  online: true
  userData:
    name: master-user-data-managed
    namespace: openshift-machine-api

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: node-5
  namespace: openshift-machine-api
spec:
  automatedCleaningMode: metadata
  bootMACAddress: 00:00:00:00:00:02
  bootMode: UEFI
  customDeploy:
    method: install_coreos
  externallyProvisioned: true
  online: true
  userData:
    name: master-user-data-managed
    namespace: openshift-machine-api

Copy to Clipboard

Toggle word wrap

Apply the BareMetalHost CR:
```
oc apply -f <filename>
```
```
$ oc apply -f <filename> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <filename> with the name of the BareMetalHost CR.

Create the Machine CR using the unique .metadata.name value:

apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    machine.openshift.io/instance-state: externally provisioned
    metal3.io/BareMetalHost: openshift-machine-api/node-5
  finalizers:
  - machine.machine.openshift.io
  labels:
    machine.openshift.io/cluster-api-cluster: <cluster_name> 
    machine.openshift.io/cluster-api-machine-role: master
    machine.openshift.io/cluster-api-machine-type: master
  name: node-5
  namespace: openshift-machine-api
spec:
  metadata: {}
  providerSpec:
    value:
      apiVersion: baremetal.cluster.k8s.io/v1alpha1
      customDeploy:
        method: install_coreos
      hostSelector: {}
      image:
        checksum: ""
        url: ""
      kind: BareMetalMachineProviderSpec
      metadata:
        creationTimestamp: null
      userData:
        name: master-user-data-managed

apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    machine.openshift.io/instance-state: externally provisioned
    metal3.io/BareMetalHost: openshift-machine-api/node-5
  finalizers:
  - machine.machine.openshift.io
  labels:
    machine.openshift.io/cluster-api-cluster: <cluster_name>


    machine.openshift.io/cluster-api-machine-role: master
    machine.openshift.io/cluster-api-machine-type: master
  name: node-5
  namespace: openshift-machine-api
spec:
  metadata: {}
  providerSpec:
    value:
      apiVersion: baremetal.cluster.k8s.io/v1alpha1
      customDeploy:
        method: install_coreos
      hostSelector: {}
      image:
        checksum: ""
        url: ""
      kind: BareMetalMachineProviderSpec
      metadata:
        creationTimestamp: null
      userData:
        name: master-user-data-managed

Copy to Clipboard

Toggle word wrap

1: Replace <cluster_name> with the name of the specific cluster, for example, test-day2-1-6qv96.

To get the cluster name, run the following command:

oc get infrastructure cluster -o=jsonpath='{.status.infrastructureName}{"\n"}'

$ oc get infrastructure cluster -o=jsonpath='{.status.infrastructureName}{"\n"}'

Copy to Clipboard

Toggle word wrap

Apply the Machine CR:
```
oc apply -f <filename>
```
```
$ oc apply -f <filename> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <filename> with the name of the Machine CR.

Link BareMetalHost, Machine, and Node by running the link-machine-and-node.sh script:

Copy the link-machine-and-node.sh script below to a local machine:

Credit goes to
https://bugzilla.redhat.com/show_bug.cgi?id=1801238.
This script will link Machine object
and Node object. This is needed
in order to have IP address of
the Node present in the status of the Machine.
The address structure on the host doesn't match the node, so extract
the values we want into separate variables so we can build the patch
we need.

#!/bin/bash

# Credit goes to
# https://bugzilla.redhat.com/show_bug.cgi?id=1801238.
# This script will link Machine object
# and Node object. This is needed
# in order to have IP address of
# the Node present in the status of the Machine.

set -e

machine="$1"
node="$2"

if [ -z "$machine" ] || [ -z "$node" ]; then
    echo "Usage: $0 MACHINE NODE"
    exit 1
fi

node_name=$(echo "${node}" | cut -f2 -d':')

oc proxy &
proxy_pid=$!
function kill_proxy {
    kill $proxy_pid
}
trap kill_proxy EXIT SIGINT

HOST_PROXY_API_PATH="http://localhost:8001/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts"

function print_nics() {
    local ips
    local eob
    declare -a ips

    readarray -t ips < <(echo "${1}" \
                         | jq '.[] | select(. | .type == "InternalIP") | .address' \
                         | sed 's/"//g')

    eob=','
    for (( i=0; i<${#ips[@]}; i++ )); do
        if [ $((i+1)) -eq ${#ips[@]} ]; then
            eob=""
        fi
        cat <<- EOF
          {
            "ip": "${ips[$i]}",
            "mac": "00:00:00:00:00:00",
            "model": "unknown",
            "speedGbps": 10,
            "vlanId": 0,
            "pxe": true,
            "name": "eth1"
          }${eob}
EOF
    done
}

function wait_for_json() {
    local name
    local url
    local curl_opts
    local timeout

    local start_time
    local curr_time
    local time_diff

    name="$1"
    url="$2"
    timeout="$3"
    shift 3
    curl_opts="$@"
    echo -n "Waiting for $name to respond"
    start_time=$(date +%s)
    until curl -g -X GET "$url" "${curl_opts[@]}" 2> /dev/null | jq '.' 2> /dev/null > /dev/null; do
        echo -n "."
        curr_time=$(date +%s)
        time_diff=$((curr_time - start_time))
        if [[ $time_diff -gt $timeout ]]; then
            printf '\nTimed out waiting for %s' "${name}"
            return 1
        fi
        sleep 5
    done
    echo " Success!"
    return 0
}
wait_for_json oc_proxy "${HOST_PROXY_API_PATH}" 10 -H "Accept: application/json" -H "Content-Type: application/json"

addresses=$(oc get node -n openshift-machine-api "${node_name}" -o json | jq -c '.status.addresses')

machine_data=$(oc get machines.machine.openshift.io -n openshift-machine-api -o json "${machine}")
host=$(echo "$machine_data" | jq '.metadata.annotations["metal3.io/BareMetalHost"]' | cut -f2 -d/ | sed 's/"//g')

if [ -z "$host" ]; then
    echo "Machine $machine is not linked to a host yet." 1>&2
    exit 1
fi

# The address structure on the host doesn't match the node, so extract
# the values we want into separate variables so we can build the patch
# we need.
hostname=$(echo "${addresses}" | jq '.[] | select(. | .type == "Hostname") | .address' | sed 's/"//g')

set +e
read -r -d '' host_patch << EOF
{
  "status": {
    "hardware": {
      "hostname": "${hostname}",
      "nics": [
$(print_nics "${addresses}")
      ],
      "systemVendor": {
        "manufacturer": "Red Hat",
        "productName": "product name",
        "serialNumber": ""
      },
      "firmware": {
        "bios": {
          "date": "04/01/2014",
          "vendor": "SeaBIOS",
          "version": "1.11.0-2.el7"
        }
      },
      "ramMebibytes": 0,
      "storage": [],
      "cpu": {
        "arch": "x86_64",
        "model": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz",
        "clockMegahertz": 2199.998,
        "count": 4,
        "flags": []
      }
    }
  }
}
EOF
set -e

echo "PATCHING HOST"
echo "${host_patch}" | jq .

curl -s \
     -X PATCH \
     "${HOST_PROXY_API_PATH}/${host}/status" \
     -H "Content-type: application/merge-patch+json" \
     -d "${host_patch}"

oc get baremetalhost -n openshift-machine-api -o yaml "${host}"

Copy to Clipboard

Toggle word wrap

Make the script executable:
```
chmod +x link-machine-and-node.sh
```
```
$ chmod +x link-machine-and-node.sh
```
Copy to Clipboard Toggle word wrap
Run the script:
```
bash link-machine-and-node.sh node-5 node-5
```
```
$ bash link-machine-and-node.sh node-5 node-5
```
Copy to Clipboard Toggle word wrap
Note
The first node-5 instance represents the machine, and the second represents the node.

Confirm members of etcd by executing into one of the pre-existing control plane nodes:

Open a remote shell session to the control plane node:
```
oc rsh -n openshift-etcd etcd-node-0
```
```
$ oc rsh -n openshift-etcd etcd-node-0
```
Copy to Clipboard Toggle word wrap

List etcd members:

etcdctl member list -w table

# etcdctl member list -w table

Copy to Clipboard

Toggle word wrap

Example output

+--------+---------+--------+--------------+--------------+---------+
|   ID   |  STATUS |  NAME  |  PEER ADDRS  | CLIENT ADDRS | LEARNER |
+--------+---------+--------+--------------+--------------+---------+
|76ae1d00| started |node-0  |192.168.111.24|192.168.111.24|  false  |
|2c18942f| started |node-1  |192.168.111.26|192.168.111.26|  false  |
|61e2a860| started |node-2  |192.168.111.25|192.168.111.25|  false  |
|ead5f280| started |node-5  |192.168.111.28|192.168.111.28|  false  |
+--------+---------+--------+--------------+--------------+---------+

+--------+---------+--------+--------------+--------------+---------+
|   ID   |  STATUS |  NAME  |  PEER ADDRS  | CLIENT ADDRS | LEARNER |
+--------+---------+--------+--------------+--------------+---------+
|76ae1d00| started |node-0  |192.168.111.24|192.168.111.24|  false  |
|2c18942f| started |node-1  |192.168.111.26|192.168.111.26|  false  |
|61e2a860| started |node-2  |192.168.111.25|192.168.111.25|  false  |
|ead5f280| started |node-5  |192.168.111.28|192.168.111.28|  false  |
+--------+---------+--------+--------------+--------------+---------+

Copy to Clipboard

Toggle word wrap

Monitor the etcd operator configuration process until completion:

oc get clusteroperator etcd

$ oc get clusteroperator etcd

Copy to Clipboard

Toggle word wrap

Example output (upon completion)

NAME   VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
etcd   4.11.5    True        False         False      5h54m

NAME   VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
etcd   4.11.5    True        False         False      5h54m

Copy to Clipboard

Toggle word wrap

Confirm etcd health by running the following commands:

Open a remote shell session to the control plane node:
```
oc rsh -n openshift-etcd etcd-node-0
```
```
$ oc rsh -n openshift-etcd etcd-node-0
```
Copy to Clipboard Toggle word wrap

Check endpoint health:

etcdctl endpoint health

# etcdctl endpoint health

Copy to Clipboard

Toggle word wrap

Example output

192.168.111.24 is healthy: committed proposal: took = 10.383651ms
192.168.111.26 is healthy: committed proposal: took = 11.297561ms
192.168.111.25 is healthy: committed proposal: took = 13.892416ms
192.168.111.28 is healthy: committed proposal: took = 11.870755ms

192.168.111.24 is healthy: committed proposal: took = 10.383651ms
192.168.111.26 is healthy: committed proposal: took = 11.297561ms
192.168.111.25 is healthy: committed proposal: took = 13.892416ms
192.168.111.28 is healthy: committed proposal: took = 11.870755ms

Copy to Clipboard

Toggle word wrap

Verify that all nodes are ready:

oc get nodes

$ oc get nodes

Copy to Clipboard

Toggle word wrap

Example output

NAME      STATUS   ROLES    AGE     VERSION
node-0    Ready    master   6h20m   v1.24.0+3882f8f
node-1    Ready    master   6h20m   v1.24.0+3882f8f
node-2    Ready    master   6h4m    v1.24.0+3882f8f
node-3    Ready    worker   6h7m    v1.24.0+3882f8f
node-4    Ready    worker   6h7m    v1.24.0+3882f8f
node-5    Ready    master   99m     v1.24.0+3882f8f

NAME      STATUS   ROLES    AGE     VERSION
node-0    Ready    master   6h20m   v1.24.0+3882f8f
node-1    Ready    master   6h20m   v1.24.0+3882f8f
node-2    Ready    master   6h4m    v1.24.0+3882f8f
node-3    Ready    worker   6h7m    v1.24.0+3882f8f
node-4    Ready    worker   6h7m    v1.24.0+3882f8f
node-5    Ready    master   99m     v1.24.0+3882f8f

Copy to Clipboard

Toggle word wrap

Verify that the cluster Operators are all available:

oc get ClusterOperators

$ oc get ClusterOperators

Copy to Clipboard

Toggle word wrap

Example output

NAME                                      VERSION AVAILABLE PROGRESSING DEGRADED SINCE MSG
authentication                            4.11.5  True      False       False    5h57m
baremetal                                 4.11.5  True      False       False    6h19m
cloud-controller-manager                  4.11.5  True      False       False    6h20m
cloud-credential                          4.11.5  True      False       False    6h23m
cluster-autoscaler                        4.11.5  True      False       False    6h18m
config-operator                           4.11.5  True      False       False    6h19m
console                                   4.11.5  True      False       False    6h4m
csi-snapshot-controller                   4.11.5  True      False       False    6h19m
dns                                       4.11.5  True      False       False    6h18m
etcd                                      4.11.5  True      False       False    6h17m
image-registry                            4.11.5  True      False       False    6h7m
ingress                                   4.11.5  True      False       False    6h6m
insights                                  4.11.5  True      False       False    6h12m
kube-apiserver                            4.11.5  True      False       False    6h16m
kube-controller-manager                   4.11.5  True      False       False    6h16m
kube-scheduler                            4.11.5  True      False       False    6h16m
kube-storage-version-migrator             4.11.5  True      False       False    6h19m
machine-api                               4.11.5  True      False       False    6h15m
machine-approver                          4.11.5  True      False       False    6h19m
machine-config                            4.11.5  True      False       False    6h18m
marketplace                               4.11.5  True      False       False    6h18m
monitoring                                4.11.5  True      False       False    6h4m
network                                   4.11.5  True      False       False    6h20m
node-tuning                               4.11.5  True      False       False    6h18m
openshift-apiserver                       4.11.5  True      False       False    6h8m
openshift-controller-manager              4.11.5  True      False       False    6h7m
openshift-samples                         4.11.5  True      False       False    6h12m
operator-lifecycle-manager                4.11.5  True      False       False    6h18m
operator-lifecycle-manager-catalog        4.11.5  True      False       False    6h19m
operator-lifecycle-manager-pkgsvr         4.11.5  True      False       False    6h12m
service-ca                                4.11.5  True      False       False    6h19m
storage                                   4.11.5  True      False       False    6h19m

NAME                                      VERSION AVAILABLE PROGRESSING DEGRADED SINCE MSG
authentication                            4.11.5  True      False       False    5h57m
baremetal                                 4.11.5  True      False       False    6h19m
cloud-controller-manager                  4.11.5  True      False       False    6h20m
cloud-credential                          4.11.5  True      False       False    6h23m
cluster-autoscaler                        4.11.5  True      False       False    6h18m
config-operator                           4.11.5  True      False       False    6h19m
console                                   4.11.5  True      False       False    6h4m
csi-snapshot-controller                   4.11.5  True      False       False    6h19m
dns                                       4.11.5  True      False       False    6h18m
etcd                                      4.11.5  True      False       False    6h17m
image-registry                            4.11.5  True      False       False    6h7m
ingress                                   4.11.5  True      False       False    6h6m
insights                                  4.11.5  True      False       False    6h12m
kube-apiserver                            4.11.5  True      False       False    6h16m
kube-controller-manager                   4.11.5  True      False       False    6h16m
kube-scheduler                            4.11.5  True      False       False    6h16m
kube-storage-version-migrator             4.11.5  True      False       False    6h19m
machine-api                               4.11.5  True      False       False    6h15m
machine-approver                          4.11.5  True      False       False    6h19m
machine-config                            4.11.5  True      False       False    6h18m
marketplace                               4.11.5  True      False       False    6h18m
monitoring                                4.11.5  True      False       False    6h4m
network                                   4.11.5  True      False       False    6h20m
node-tuning                               4.11.5  True      False       False    6h18m
openshift-apiserver                       4.11.5  True      False       False    6h8m
openshift-controller-manager              4.11.5  True      False       False    6h7m
openshift-samples                         4.11.5  True      False       False    6h12m
operator-lifecycle-manager                4.11.5  True      False       False    6h18m
operator-lifecycle-manager-catalog        4.11.5  True      False       False    6h19m
operator-lifecycle-manager-pkgsvr         4.11.5  True      False       False    6h12m
service-ca                                4.11.5  True      False       False    6h19m
storage                                   4.11.5  True      False       False    6h19m

Copy to Clipboard

Toggle word wrap

Verify that the cluster version is correct:

oc get ClusterVersion

$ oc get ClusterVersion

Copy to Clipboard

Toggle word wrap

Example output

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.5    True        False         5h57m   Cluster version is 4.11.5

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.5    True        False         5h57m   Cluster version is 4.11.5

Copy to Clipboard

Toggle word wrap

11.5.2. Removing the existing control plane node
Copy link

Remove the control plane node that you are replacing. This is node-0 in the example below.

Prerequisites

You have added a new healthy control plane node.

Procedure

Delete the BareMetalHost CR of the pre-existing control plane node:
```
oc delete bmh -n openshift-machine-api node-0
```
```
$ oc delete bmh -n openshift-machine-api node-0
```
Copy to Clipboard Toggle word wrap

Confirm that the machine is unhealthy:

oc get machine -A

$ oc get machine -A

Copy to Clipboard

Toggle word wrap

Example output

NAMESPACE               NAME     PHASE     AGE
openshift-machine-api   node-0   Failed    20h
openshift-machine-api   node-1   Running   20h
openshift-machine-api   node-2   Running   20h
openshift-machine-api   node-3   Running   19h
openshift-machine-api   node-4   Running   19h
openshift-machine-api   node-5   Running   14h

NAMESPACE               NAME     PHASE     AGE
openshift-machine-api   node-0   Failed    20h
openshift-machine-api   node-1   Running   20h
openshift-machine-api   node-2   Running   20h
openshift-machine-api   node-3   Running   19h
openshift-machine-api   node-4   Running   19h
openshift-machine-api   node-5   Running   14h

Copy to Clipboard

Toggle word wrap

Delete the Machine CR:

oc delete machine -n openshift-machine-api node-0

$ oc delete machine -n openshift-machine-api node-0
machine.machine.openshift.io "node-0" deleted

Copy to Clipboard

Toggle word wrap

Confirm removal of the Node CR:

oc get nodes

$ oc get nodes

Copy to Clipboard

Toggle word wrap

Example output

NAME      STATUS   ROLES    AGE   VERSION
node-1    Ready    master   20h   v1.24.0+3882f8f
node-2    Ready    master   19h   v1.24.0+3882f8f
node-3    Ready    worker   19h   v1.24.0+3882f8f
node-4    Ready    worker   19h   v1.24.0+3882f8f
node-5    Ready    master   15h   v1.24.0+3882f8f

NAME      STATUS   ROLES    AGE   VERSION
node-1    Ready    master   20h   v1.24.0+3882f8f
node-2    Ready    master   19h   v1.24.0+3882f8f
node-3    Ready    worker   19h   v1.24.0+3882f8f
node-4    Ready    worker   19h   v1.24.0+3882f8f
node-5    Ready    master   15h   v1.24.0+3882f8f

Copy to Clipboard

Toggle word wrap

Check etcd-operator logs to confirm status of the etcd cluster:

oc logs -n openshift-etcd-operator etcd-operator-8668df65d-lvpjf

$ oc logs -n openshift-etcd-operator etcd-operator-8668df65d-lvpjf

Copy to Clipboard

Toggle word wrap

Example output

E0927 07:53:10.597523       1 base_controller.go:272] ClusterMemberRemovalController reconciliation failed: cannot remove member: 192.168.111.23 because it is reported as healthy but it doesn't have a machine nor a node resource

E0927 07:53:10.597523       1 base_controller.go:272] ClusterMemberRemovalController reconciliation failed: cannot remove member: 192.168.111.23 because it is reported as healthy but it doesn't have a machine nor a node resource

Copy to Clipboard

Toggle word wrap

Remove the physical machine to allow the etcd operator to reconcile the cluster members:

Open a remote shell session to the control plane node:
```
oc rsh -n openshift-etcd etcd-node-1
```
```
$ oc rsh -n openshift-etcd etcd-node-1
```
Copy to Clipboard Toggle word wrap

Monitor the progress of etcd operator reconciliation by checking members and endpoint health:

etcdctl member list -w table; etcdctl endpoint health

# etcdctl member list -w table; etcdctl endpoint health

Copy to Clipboard

Toggle word wrap

Example output

+--------+---------+--------+--------------+--------------+---------+
|   ID   |  STATUS |  NAME  |  PEER ADDRS  | CLIENT ADDRS | LEARNER |
+--------+---------+--------+--------------+--------------+---------+
|2c18942f| started | node-1 |192.168.111.26|192.168.111.26|  false  |
|61e2a860| started | node-2 |192.168.111.25|192.168.111.25|  false  |
|ead4f280| started | node-5 |192.168.111.28|192.168.111.28|  false  |
+--------+---------+--------+--------------+--------------+---------+
192.168.111.26 is healthy: committed proposal: took = 10.458132ms
192.168.111.25 is healthy: committed proposal: took = 11.047349ms
192.168.111.28 is healthy: committed proposal: took = 11.414402ms

+--------+---------+--------+--------------+--------------+---------+
|   ID   |  STATUS |  NAME  |  PEER ADDRS  | CLIENT ADDRS | LEARNER |
+--------+---------+--------+--------------+--------------+---------+
|2c18942f| started | node-1 |192.168.111.26|192.168.111.26|  false  |
|61e2a860| started | node-2 |192.168.111.25|192.168.111.25|  false  |
|ead4f280| started | node-5 |192.168.111.28|192.168.111.28|  false  |
+--------+---------+--------+--------------+--------------+---------+
192.168.111.26 is healthy: committed proposal: took = 10.458132ms
192.168.111.25 is healthy: committed proposal: took = 11.047349ms
192.168.111.28 is healthy: committed proposal: took = 11.414402ms

Copy to Clipboard

Toggle word wrap

11.6. Replacing a control plane node in an unhealthy cluster
Copy link

You can replace an unhealthy control plane (master) node in an OpenShift Container Platform cluster that has three to five control plane nodes, by removing the unhealthy control plane node and adding a new one.

For details on replacing a control plane node in a healthy cluster, see Replacing a control plane node in a healthy cluster.

11.6.1. Removing an unhealthy control plane node
Copy link

Remove the unhealthy control plane node from the cluster. This is node-0 in the example below.

Prerequisites

You have installed a cluster with at least three control plane nodes.
At least one of the control plane nodes is not ready.

Procedure

Check the node status to confirm that a control plane node is not ready:

oc get nodes

$ oc get nodes

Copy to Clipboard

Toggle word wrap

Example output

NAME      STATUS      ROLES    AGE   VERSION
node-0    NotReady    master   20h   v1.24.0+3882f8f
node-1    Ready       master   20h   v1.24.0+3882f8f
node-2    Ready       master   20h   v1.24.0+3882f8f
node-3    Ready       worker   20h   v1.24.0+3882f8f
node-4    Ready       worker   20h   v1.24.0+3882f8f

NAME      STATUS      ROLES    AGE   VERSION
node-0    NotReady    master   20h   v1.24.0+3882f8f
node-1    Ready       master   20h   v1.24.0+3882f8f
node-2    Ready       master   20h   v1.24.0+3882f8f
node-3    Ready       worker   20h   v1.24.0+3882f8f
node-4    Ready       worker   20h   v1.24.0+3882f8f

Copy to Clipboard

Toggle word wrap

Confirm in the etcd-operator logs that the cluster is unhealthy:

oc logs -n openshift-etcd-operator etcd-operator deployment/etcd-operator

$ oc logs -n openshift-etcd-operator etcd-operator deployment/etcd-operator

Copy to Clipboard

Toggle word wrap

Example output

E0927 08:24:23.983733       1 base_controller.go:272] DefragController reconciliation failed: cluster is unhealthy: 2 of 3 members are available, node-0 is unhealthy

E0927 08:24:23.983733       1 base_controller.go:272] DefragController reconciliation failed: cluster is unhealthy: 2 of 3 members are available, node-0 is unhealthy

Copy to Clipboard

Toggle word wrap

Confirm the etcd members by running the following commands:

Open a remote shell session to the control plane node:
```
oc rsh -n openshift-etcd node-1
```
```
$ oc rsh -n openshift-etcd node-1
```
Copy to Clipboard Toggle word wrap

List the etcdctl members:

etcdctl member list -w table

# etcdctl member list -w table

Copy to Clipboard

Toggle word wrap

Example output

+--------+---------+--------+--------------+--------------+---------+
|   ID   | STATUS  |  NAME  |  PEER ADDRS  | CLIENT ADDRS | LEARNER |
+--------+---------+--------+--------------+--------------+---------+
|61e2a860| started | node-0 |192.168.111.25|192.168.111.25|  false  |
|2c18942f| started | node-1 |192.168.111.26|192.168.111.26|  false  |
|ead4f280| started | node-2 |192.168.111.28|192.168.111.28|  false  |
+--------+---------+--------+--------------+--------------+---------+

+--------+---------+--------+--------------+--------------+---------+
|   ID   | STATUS  |  NAME  |  PEER ADDRS  | CLIENT ADDRS | LEARNER |
+--------+---------+--------+--------------+--------------+---------+
|61e2a860| started | node-0 |192.168.111.25|192.168.111.25|  false  |
|2c18942f| started | node-1 |192.168.111.26|192.168.111.26|  false  |
|ead4f280| started | node-2 |192.168.111.28|192.168.111.28|  false  |
+--------+---------+--------+--------------+--------------+---------+

Copy to Clipboard

Toggle word wrap

Confirm that etcdctl endpoint health reports an unhealthy member of the cluster:

etcdctl endpoint health

# etcdctl endpoint health

Copy to Clipboard

Toggle word wrap

Example output

{"level":"warn","ts":"2022-09-27T08:25:35.953Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000680380/192.168.111.25","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.111.25: connect: no route to host\""}
192.168.111.28 is healthy: committed proposal: took = 12.465641ms
192.168.111.26 is healthy: committed proposal: took = 12.297059ms
192.168.111.25 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster

{"level":"warn","ts":"2022-09-27T08:25:35.953Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000680380/192.168.111.25","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.111.25: connect: no route to host\""}
192.168.111.28 is healthy: committed proposal: took = 12.465641ms
192.168.111.26 is healthy: committed proposal: took = 12.297059ms
192.168.111.25 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster

Copy to Clipboard

Toggle word wrap

Remove the unhealthy control plane by deleting the Machine custom resource (CR):
```
oc delete machine -n openshift-machine-api node-0
```
```
$ oc delete machine -n openshift-machine-api node-0
```
Copy to Clipboard Toggle word wrap
Note
The Machine and Node CRs might not be deleted because they are protected by finalizers. If this occurs, you must delete the Machine CR manually by removing all finalizers.

Verify in the etcd-operator logs whether the unhealthy machine has been removed:

oc logs -n openshift-etcd-operator etcd-operator deployment/ettcd-operator

$ oc logs -n openshift-etcd-operator etcd-operator deployment/ettcd-operator

Copy to Clipboard

Toggle word wrap

Example output

I0927 08:58:41.249222       1 machinedeletionhooks.go:135] skip removing the deletion hook from machine node-0 since its member is still present with any of: [{InternalIP } {InternalIP 192.168.111.25}]

I0927 08:58:41.249222       1 machinedeletionhooks.go:135] skip removing the deletion hook from machine node-0 since its member is still present with any of: [{InternalIP } {InternalIP 192.168.111.25}]

Copy to Clipboard

Toggle word wrap

If you see that removal has been skipped, as in the above log example, manually remove the unhealthy etcdctl member:

Open a remote shell session to the control plane node:
```
oc rsh -n openshift-etcd node-1
```
```
$ oc rsh -n openshift-etcd node-1
```
Copy to Clipboard Toggle word wrap

List the etcdctl members:

etcdctl member list -w table

# etcdctl member list -w table

Copy to Clipboard

Toggle word wrap

Example output

+--------+---------+--------+--------------+--------------+---------+
|   ID   |  STATUS |  NAME  |  PEER ADDRS  | CLIENT ADDRS | LEARNER |
+--------+---------+--------+--------------+--------------+---------+
|61e2a860| started | node-0 |192.168.111.25|192.168.111.25|  false  |
|2c18942f| started | node-1 |192.168.111.26|192.168.111.26|  false  |
|ead4f280| started | node-2 |192.168.111.28|192.168.111.28|  false  |
+--------+---------+--------+--------------+--------------+---------+

+--------+---------+--------+--------------+--------------+---------+
|   ID   |  STATUS |  NAME  |  PEER ADDRS  | CLIENT ADDRS | LEARNER |
+--------+---------+--------+--------------+--------------+---------+
|61e2a860| started | node-0 |192.168.111.25|192.168.111.25|  false  |
|2c18942f| started | node-1 |192.168.111.26|192.168.111.26|  false  |
|ead4f280| started | node-2 |192.168.111.28|192.168.111.28|  false  |
+--------+---------+--------+--------------+--------------+---------+

Copy to Clipboard

Toggle word wrap

Confirm that etcdctl endpoint health reports an unhealthy member of the cluster:

etcdctl endpoint health

# etcdctl endpoint health

Copy to Clipboard

Toggle word wrap

Example output

{"level":"warn","ts":"2022-09-27T10:31:07.227Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000d6e00/192.168.111.25","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.111.25: connect: no route to host\""}
192.168.111.28 is healthy: committed proposal: took = 13.038278ms
192.168.111.26 is healthy: committed proposal: took = 12.950355ms
192.168.111.25 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster

{"level":"warn","ts":"2022-09-27T10:31:07.227Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000d6e00/192.168.111.25","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.111.25: connect: no route to host\""}
192.168.111.28 is healthy: committed proposal: took = 13.038278ms
192.168.111.26 is healthy: committed proposal: took = 12.950355ms
192.168.111.25 is unhealthy: failed to commit proposal: context deadline exceeded
Error: unhealthy cluster

Copy to Clipboard

Toggle word wrap

Remove the unhealthy etcdctl member from the cluster:

etcdctl member remove 61e2a86084aafa62

# etcdctl member remove 61e2a86084aafa62

Copy to Clipboard

Toggle word wrap

Example output

Member 61e2a86084aafa62 removed from cluster 6881c977b97990d7

Member 61e2a86084aafa62 removed from cluster 6881c977b97990d7

Copy to Clipboard

Toggle word wrap

Verify that the unhealthy etcdctl member was removed by running the following command:

etcdctl member list -w table

# etcdctl member list -w table

Copy to Clipboard

Toggle word wrap

Example output

+----------+---------+--------+--------------+--------------+-------+
|    ID    | STATUS  |  NAME  |  PEER ADDRS  | CLIENT ADDRS |LEARNER|
+----------+---------+--------+--------------+--------------+-------+
| 2c18942f | started | node-1 |192.168.111.26|192.168.111.26| false |
| ead4f280 | started | node-2 |192.168.111.28|192.168.111.28| false |
+----------+---------+--------+--------------+--------------+-------+

+----------+---------+--------+--------------+--------------+-------+
|    ID    | STATUS  |  NAME  |  PEER ADDRS  | CLIENT ADDRS |LEARNER|
+----------+---------+--------+--------------+--------------+-------+
| 2c18942f | started | node-1 |192.168.111.26|192.168.111.26| false |
| ead4f280 | started | node-2 |192.168.111.28|192.168.111.28| false |
+----------+---------+--------+--------------+--------------+-------+

Copy to Clipboard

Toggle word wrap

11.6.2. Adding a new control plane node
Copy link

Add a new control plane node to replace the unhealthy node that you removed. In the example below, the new node is node-5.

Prerequisites

You have installed a control plane node for Day 2. For more information, see Adding hosts with the web console or Adding hosts with the API.

Procedure

Retrieve pending Certificate Signing Requests (CSRs) for the new Day 2 control plane node:

oc get csr | grep Pending

$ oc get csr | grep Pending

Copy to Clipboard

Toggle word wrap

Example output

csr-5sd59   8m19s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
csr-xzqts   10s     kubernetes.io/kubelet-serving                 system:node:node-5                                                   <none>              Pending

csr-5sd59   8m19s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
csr-xzqts   10s     kubernetes.io/kubelet-serving                 system:node:node-5                                                   <none>              Pending

Copy to Clipboard

Toggle word wrap

Approve all pending CSRs for the new node (node-5 in this example):

oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve

$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve

Copy to Clipboard

Toggle word wrap

Note

You must approve the CSRs to complete the installation.

Confirm that the control plane node is in Ready status:

oc get nodes

$ oc get nodes

Copy to Clipboard

Toggle word wrap

Example output

NAME      STATUS    ROLES     AGE     VERSION
node-1    Ready     master    20h     v1.24.0+3882f8f
node-2    Ready     master    20h     v1.24.0+3882f8f
node-3    Ready     worker    20h     v1.24.0+3882f8f
node-4    Ready     worker    20h     v1.24.0+3882f8f
node-5    Ready     master    2m52s   v1.24.0+3882f8f

NAME      STATUS    ROLES     AGE     VERSION
node-1    Ready     master    20h     v1.24.0+3882f8f
node-2    Ready     master    20h     v1.24.0+3882f8f
node-3    Ready     worker    20h     v1.24.0+3882f8f
node-4    Ready     worker    20h     v1.24.0+3882f8f
node-5    Ready     master    2m52s   v1.24.0+3882f8f

Copy to Clipboard

Toggle word wrap

The etcd operator requires a Machine CR referencing the new node when the cluster runs with a Machine API. The machine API is automatically activated when the cluster has three control plane nodes.

Create the BareMetalHost and Machine CRs and link them to the new control plane’s Node CR.

Important

Boot-it-yourself will not create BareMetalHost and Machine CRs, so you must create them. Failure to create the BareMetalHost and Machine CRs will generate errors in the etcd operator.

Create the BareMetalHost CR with a unique .metadata.name value:

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: node-5
  namespace: openshift-machine-api
spec:
  automatedCleaningMode: metadata
  bootMACAddress: 00:00:00:00:00:02
  bootMode: UEFI
  customDeploy:
    method: install_coreos
  externallyProvisioned: true
  online: true
  userData:
    name: master-user-data-managed
    namespace: openshift-machine-api

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: node-5
  namespace: openshift-machine-api
spec:
  automatedCleaningMode: metadata
  bootMACAddress: 00:00:00:00:00:02
  bootMode: UEFI
  customDeploy:
    method: install_coreos
  externallyProvisioned: true
  online: true
  userData:
    name: master-user-data-managed
    namespace: openshift-machine-api

Copy to Clipboard

Toggle word wrap

Apply the BareMetalHost CR:
```
oc apply -f <filename>
```
```
$ oc apply -f <filename> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <filename> with the name of the BareMetalHost CR.

Create the Machine CR using the unique .metadata.name value:

apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    machine.openshift.io/instance-state: externally provisioned
    metal3.io/BareMetalHost: openshift-machine-api/node-5
  finalizers:
  - machine.machine.openshift.io
  labels:
    machine.openshift.io/cluster-api-cluster: test-day2-1-6qv96
    machine.openshift.io/cluster-api-machine-role: master
    machine.openshift.io/cluster-api-machine-type: master
  name: node-5
  namespace: openshift-machine-api
spec:
  metadata: {}
  providerSpec:
    value:
      apiVersion: baremetal.cluster.k8s.io/v1alpha1
      customDeploy:
        method: install_coreos
      hostSelector: {}
      image:
        checksum: ""
        url: ""
      kind: BareMetalMachineProviderSpec
      metadata:
        creationTimestamp: null
      userData:
       name: master-user-data-managed

apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    machine.openshift.io/instance-state: externally provisioned
    metal3.io/BareMetalHost: openshift-machine-api/node-5
  finalizers:
  - machine.machine.openshift.io
  labels:
    machine.openshift.io/cluster-api-cluster: test-day2-1-6qv96
    machine.openshift.io/cluster-api-machine-role: master
    machine.openshift.io/cluster-api-machine-type: master
  name: node-5
  namespace: openshift-machine-api
spec:
  metadata: {}
  providerSpec:
    value:
      apiVersion: baremetal.cluster.k8s.io/v1alpha1
      customDeploy:
        method: install_coreos
      hostSelector: {}
      image:
        checksum: ""
        url: ""
      kind: BareMetalMachineProviderSpec
      metadata:
        creationTimestamp: null
      userData:
       name: master-user-data-managed

Copy to Clipboard

Toggle word wrap

Apply the Machine CR:
```
oc apply -f <filename>
```
```
$ oc apply -f <filename> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <filename> with the name of the Machine CR.

Link BareMetalHost, Machine, and Node by running the link-machine-and-node.sh script:

Copy the link-machine-and-node.sh script below to a local machine:

Credit goes to
https://bugzilla.redhat.com/show_bug.cgi?id=1801238.
This script will link Machine object
and Node object. This is needed
in order to have IP address of
the Node present in the status of the Machine.
The address structure on the host doesn't match the node, so extract
the values we want into separate variables so we can build the patch
we need.

#!/bin/bash

# Credit goes to
# https://bugzilla.redhat.com/show_bug.cgi?id=1801238.
# This script will link Machine object
# and Node object. This is needed
# in order to have IP address of
# the Node present in the status of the Machine.

set -e

machine="$1"
node="$2"

if [ -z "$machine" ] || [ -z "$node" ]; then
    echo "Usage: $0 MACHINE NODE"
    exit 1
fi

node_name=$(echo "${node}" | cut -f2 -d':')

oc proxy &
proxy_pid=$!
function kill_proxy {
    kill $proxy_pid
}
trap kill_proxy EXIT SIGINT

HOST_PROXY_API_PATH="http://localhost:8001/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts"

function print_nics() {
    local ips
    local eob
    declare -a ips

    readarray -t ips < <(echo "${1}" \
                         | jq '.[] | select(. | .type == "InternalIP") | .address' \
                         | sed 's/"//g')

    eob=','
    for (( i=0; i<${#ips[@]}; i++ )); do
        if [ $((i+1)) -eq ${#ips[@]} ]; then
            eob=""
        fi
        cat <<- EOF
          {
            "ip": "${ips[$i]}",
            "mac": "00:00:00:00:00:00",
            "model": "unknown",
            "speedGbps": 10,
            "vlanId": 0,
            "pxe": true,
            "name": "eth1"
          }${eob}
EOF
    done
}

function wait_for_json() {
    local name
    local url
    local curl_opts
    local timeout

    local start_time
    local curr_time
    local time_diff

    name="$1"
    url="$2"
    timeout="$3"
    shift 3
    curl_opts="$@"
    echo -n "Waiting for $name to respond"
    start_time=$(date +%s)
    until curl -g -X GET "$url" "${curl_opts[@]}" 2> /dev/null | jq '.' 2> /dev/null > /dev/null; do
        echo -n "."
        curr_time=$(date +%s)
        time_diff=$((curr_time - start_time))
        if [[ $time_diff -gt $timeout ]]; then
            printf '\nTimed out waiting for %s' "${name}"
            return 1
        fi
        sleep 5
    done
    echo " Success!"
    return 0
}
wait_for_json oc_proxy "${HOST_PROXY_API_PATH}" 10 -H "Accept: application/json" -H "Content-Type: application/json"

addresses=$(oc get node -n openshift-machine-api "${node_name}" -o json | jq -c '.status.addresses')

machine_data=$(oc get machines.machine.openshift.io -n openshift-machine-api -o json "${machine}")
host=$(echo "$machine_data" | jq '.metadata.annotations["metal3.io/BareMetalHost"]' | cut -f2 -d/ | sed 's/"//g')

if [ -z "$host" ]; then
    echo "Machine $machine is not linked to a host yet." 1>&2
    exit 1
fi

# The address structure on the host doesn't match the node, so extract
# the values we want into separate variables so we can build the patch
# we need.
hostname=$(echo "${addresses}" | jq '.[] | select(. | .type == "Hostname") | .address' | sed 's/"//g')

set +e
read -r -d '' host_patch << EOF
{
  "status": {
    "hardware": {
      "hostname": "${hostname}",
      "nics": [
$(print_nics "${addresses}")
      ],
      "systemVendor": {
        "manufacturer": "Red Hat",
        "productName": "product name",
        "serialNumber": ""
      },
      "firmware": {
        "bios": {
          "date": "04/01/2014",
          "vendor": "SeaBIOS",
          "version": "1.11.0-2.el7"
        }
      },
      "ramMebibytes": 0,
      "storage": [],
      "cpu": {
        "arch": "x86_64",
        "model": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz",
        "clockMegahertz": 2199.998,
        "count": 4,
        "flags": []
      }
    }
  }
}
EOF
set -e

echo "PATCHING HOST"
echo "${host_patch}" | jq .

curl -s \
     -X PATCH \
     "${HOST_PROXY_API_PATH}/${host}/status" \
     -H "Content-type: application/merge-patch+json" \
     -d "${host_patch}"

oc get baremetalhost -n openshift-machine-api -o yaml "${host}"

Copy to Clipboard

Toggle word wrap

Make the script executable:
```
chmod +x link-machine-and-node.sh
```
```
$ chmod +x link-machine-and-node.sh
```
Copy to Clipboard Toggle word wrap
Run the script:
```
bash link-machine-and-node.sh node-5 node-5
```
```
$ bash link-machine-and-node.sh node-5 node-5
```
Copy to Clipboard Toggle word wrap
Note
The first node-5 instance represents the machine, and the second represents the node.

Confirm members of etcd by running the following commands:

Open a remote shell session to the control plane node:
```
oc rsh -n openshift-etcd node-1
```
```
$ oc rsh -n openshift-etcd node-1
```
Copy to Clipboard Toggle word wrap

List the etcdctl members:

etcdctl member list -w table

# etcdctl member list -w table

Copy to Clipboard

Toggle word wrap

Example output

+---------+-------+--------+--------------+--------------+-------+
|   ID    | STATUS|  NAME  |   PEER ADDRS | CLIENT ADDRS |LEARNER|
+---------+-------+--------+--------------+--------------+-------+
| 2c18942f|started| node-1 |192.168.111.26|192.168.111.26| false |
| ead4f280|started| node-2 |192.168.111.28|192.168.111.28| false |
| 79153c5a|started| node-5 |192.168.111.29|192.168.111.29| false |
+---------+-------+--------+--------------+--------------+-------+

+---------+-------+--------+--------------+--------------+-------+
|   ID    | STATUS|  NAME  |   PEER ADDRS | CLIENT ADDRS |LEARNER|
+---------+-------+--------+--------------+--------------+-------+
| 2c18942f|started| node-1 |192.168.111.26|192.168.111.26| false |
| ead4f280|started| node-2 |192.168.111.28|192.168.111.28| false |
| 79153c5a|started| node-5 |192.168.111.29|192.168.111.29| false |
+---------+-------+--------+--------------+--------------+-------+

Copy to Clipboard

Toggle word wrap

Monitor the etcd operator configuration process until completion:

oc get clusteroperator etcd

$ oc get clusteroperator etcd

Copy to Clipboard

Toggle word wrap

Example output (upon completion)

NAME   VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
etcd   4.11.5    True        False         False      22h

NAME   VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
etcd   4.11.5    True        False         False      22h

Copy to Clipboard

Toggle word wrap

Confirm etcdctl health by running the following commands:

Open a remote shell session to the control plane node:
```
oc rsh -n openshift-etcd node-1
```
```
$ oc rsh -n openshift-etcd node-1
```
Copy to Clipboard Toggle word wrap

Check endpoint health:

etcdctl endpoint health

# etcdctl endpoint health

Copy to Clipboard

Toggle word wrap

Example output

192.168.111.26 is healthy: committed proposal: took = 9.105375ms
192.168.111.28 is healthy: committed proposal: took = 9.15205ms
192.168.111.29 is healthy: committed proposal: took = 10.277577ms

192.168.111.26 is healthy: committed proposal: took = 9.105375ms
192.168.111.28 is healthy: committed proposal: took = 9.15205ms
192.168.111.29 is healthy: committed proposal: took = 10.277577ms

Copy to Clipboard

Toggle word wrap

Confirm the health of the nodes:

oc get Nodes

$ oc get Nodes

Copy to Clipboard

Toggle word wrap

Example output

NAME     STATUS   ROLES    AGE   VERSION
node-1   Ready    master   20h   v1.24.0+3882f8f
node-2   Ready    master   20h   v1.24.0+3882f8f
node-3   Ready    worker   20h   v1.24.0+3882f8f
node-4   Ready    worker   20h   v1.24.0+3882f8f
node-5   Ready    master   40m   v1.24.0+3882f8f

NAME     STATUS   ROLES    AGE   VERSION
node-1   Ready    master   20h   v1.24.0+3882f8f
node-2   Ready    master   20h   v1.24.0+3882f8f
node-3   Ready    worker   20h   v1.24.0+3882f8f
node-4   Ready    worker   20h   v1.24.0+3882f8f
node-5   Ready    master   40m   v1.24.0+3882f8f

Copy to Clipboard

Toggle word wrap

Verify that the cluster Operators are all available:

oc get ClusterOperators

$ oc get ClusterOperators

Copy to Clipboard

Toggle word wrap

Example output

NAME                               VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                     4.11.5    True        False         False      150m
baremetal                          4.11.5    True        False         False      22h
cloud-controller-manager           4.11.5    True        False         False      22h
cloud-credential                   4.11.5    True        False         False      22h
cluster-autoscaler                 4.11.5    True        False         False      22h
config-operator                    4.11.5    True        False         False      22h
console                            4.11.5    True        False         False      145m
csi-snapshot-controller            4.11.5    True        False         False      22h
dns                                4.11.5    True        False         False      22h
etcd                               4.11.5    True        False         False      22h
image-registry                     4.11.5    True        False         False      22h
ingress                            4.11.5    True        False         False      22h
insights                           4.11.5    True        False         False      22h
kube-apiserver                     4.11.5    True        False         False      22h
kube-controller-manager            4.11.5    True        False         False      22h
kube-scheduler                     4.11.5    True        False         False      22h
kube-storage-version-migrator      4.11.5    True        False         False      148m
machine-api                        4.11.5    True        False         False      22h
machine-approver                   4.11.5    True        False         False      22h
machine-config                     4.11.5    True        False         False      110m
marketplace                        4.11.5    True        False         False      22h
monitoring                         4.11.5    True        False         False      22h
network                            4.11.5    True        False         False      22h
node-tuning                        4.11.5    True        False         False      22h
openshift-apiserver                4.11.5    True        False         False      163m
openshift-controller-manager       4.11.5    True        False         False      22h
openshift-samples                  4.11.5    True        False         False      22h
operator-lifecycle-manager         4.11.5    True        False         False      22h
operator-lifecycle-manager-catalog 4.11.5    True        False         False      22h
operator-lifecycle-manager-pkgsvr  4.11.5    True        False         False      22h
service-ca                         4.11.5    True        False         False      22h
storage                            4.11.5    True        False         False      22h

NAME                               VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                     4.11.5    True        False         False      150m
baremetal                          4.11.5    True        False         False      22h
cloud-controller-manager           4.11.5    True        False         False      22h
cloud-credential                   4.11.5    True        False         False      22h
cluster-autoscaler                 4.11.5    True        False         False      22h
config-operator                    4.11.5    True        False         False      22h
console                            4.11.5    True        False         False      145m
csi-snapshot-controller            4.11.5    True        False         False      22h
dns                                4.11.5    True        False         False      22h
etcd                               4.11.5    True        False         False      22h
image-registry                     4.11.5    True        False         False      22h
ingress                            4.11.5    True        False         False      22h
insights                           4.11.5    True        False         False      22h
kube-apiserver                     4.11.5    True        False         False      22h
kube-controller-manager            4.11.5    True        False         False      22h
kube-scheduler                     4.11.5    True        False         False      22h
kube-storage-version-migrator      4.11.5    True        False         False      148m
machine-api                        4.11.5    True        False         False      22h
machine-approver                   4.11.5    True        False         False      22h
machine-config                     4.11.5    True        False         False      110m
marketplace                        4.11.5    True        False         False      22h
monitoring                         4.11.5    True        False         False      22h
network                            4.11.5    True        False         False      22h
node-tuning                        4.11.5    True        False         False      22h
openshift-apiserver                4.11.5    True        False         False      163m
openshift-controller-manager       4.11.5    True        False         False      22h
openshift-samples                  4.11.5    True        False         False      22h
operator-lifecycle-manager         4.11.5    True        False         False      22h
operator-lifecycle-manager-catalog 4.11.5    True        False         False      22h
operator-lifecycle-manager-pkgsvr  4.11.5    True        False         False      22h
service-ca                         4.11.5    True        False         False      22h
storage                            4.11.5    True        False         False      22h

Copy to Clipboard

Toggle word wrap

Verify that the cluster version is correct:

oc get ClusterVersion

$ oc get ClusterVersion

Copy to Clipboard

Toggle word wrap

Example output

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.5    True        False         22h     Cluster version is 4.11.5

NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.5    True        False         22h     Cluster version is 4.11.5

Copy to Clipboard

Toggle word wrap

Chapter 11. Expanding the cluster

11.1. Checking for multi-architecture support
Copy link

11.2. Installing multi-architecture compute clusters
Copy link

11.3. Adding hosts with the web console
Copy link

11.4. Adding hosts with the API
Copy link

11.5. Replacing a control plane node in a healthy cluster
Copy link

11.5.1. Adding a new control plane node
Copy link

11.5.2. Removing the existing control plane node
Copy link

11.6. Replacing a control plane node in an unhealthy cluster
Copy link

11.6.1. Removing an unhealthy control plane node
Copy link

11.6.2. Adding a new control plane node
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 11. Expanding the cluster

11.1. Checking for multi-architecture supportCopy linkLink copied to clipboard!

11.2. Installing multi-architecture compute clustersCopy linkLink copied to clipboard!

11.3. Adding hosts with the web consoleCopy linkLink copied to clipboard!

11.4. Adding hosts with the APICopy linkLink copied to clipboard!

11.5. Replacing a control plane node in a healthy clusterCopy linkLink copied to clipboard!

11.5.1. Adding a new control plane nodeCopy linkLink copied to clipboard!

11.5.2. Removing the existing control plane nodeCopy linkLink copied to clipboard!

11.6. Replacing a control plane node in an unhealthy clusterCopy linkLink copied to clipboard!

11.6.1. Removing an unhealthy control plane nodeCopy linkLink copied to clipboard!

11.6.2. Adding a new control plane nodeCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

11.1. Checking for multi-architecture support
Copy link

11.2. Installing multi-architecture compute clusters
Copy link

11.3. Adding hosts with the web console
Copy link

11.4. Adding hosts with the API
Copy link

11.5. Replacing a control plane node in a healthy cluster
Copy link

11.5.1. Adding a new control plane node
Copy link

11.5.2. Removing the existing control plane node
Copy link

11.6. Replacing a control plane node in an unhealthy cluster
Copy link

11.6.1. Removing an unhealthy control plane node
Copy link

11.6.2. Adding a new control plane node
Copy link