12.6. 비정상 클러스터에 기본 컨트롤 플레인 노드 설치
다음 절차에서는 비정상 OpenShift Container Platform 클러스터에 기본 컨트롤 플레인 노드를 설치하는 방법을 설명합니다.
사전 요구 사항
절차
클러스터의 초기 상태를 확인합니다.
$ oc get nodes
출력 예
NAME STATUS ROLES AGE VERSION worker-1 Ready worker 20h v1.24.0+3882f8f master-2 NotReady master 20h v1.24.0+3882f8f master-3 Ready master 20h v1.24.0+3882f8f worker-4 Ready worker 20h v1.24.0+3882f8f master-5 Ready master 15h v1.24.0+3882f8f
etcd-operator
에서 클러스터를 비정상으로 탐지하는지 확인합니다.$ oc logs -n openshift-etcd-operator etcd-operator-8668df65d-lvpjf
출력 예
E0927 08:24:23.983733 1 base_controller.go:272] DefragController reconciliation failed: cluster is unhealthy: 2 of 3 members are available, worker-2 is unhealthy
etcdctl
멤버를 확인합니다.$ oc rsh -n openshift-etcd etcd-worker-3 etcdctl member list -w table
출력 예
+--------+---------+--------+--------------+--------------+---------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | LEARNER | +--------+---------+--------+--------------+--------------+---------+ |2c18942f| started |worker-3|192.168.111.26|192.168.111.26| false | |61e2a860| started |worker-2|192.168.111.25|192.168.111.25| false | |ead4f280| started |worker-5|192.168.111.28|192.168.111.28| false | +--------+---------+--------+--------------+--------------+---------+
etcdctl
에서 클러스터의 비정상 멤버를 보고하는지 확인합니다.$ etcdctl endpoint health
출력 예
{"level":"warn","ts":"2022-09-27T08:25:35.953Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000680380/192.168.111.25","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.111.25: connect: no route to host\""} 192.168.111.28 is healthy: committed proposal: took = 12.465641ms 192.168.111.26 is healthy: committed proposal: took = 12.297059ms 192.168.111.25 is unhealthy: failed to commit proposal: context deadline exceeded Error: unhealthy cluster
Machine
사용자 정의 리소스를 삭제하여 비정상 컨트롤 플레인을 제거합니다.$ oc delete machine -n openshift-machine-api test-day2-1-6qv96-master-2
참고비정상 클러스터를 성공적으로 실행할 수 없는 경우
머신
및노드
사용자 정의 리소스(CR)는 삭제되지 않습니다.etcd-operator
가 비정상 머신을 제거하지 않았는지 확인합니다.$ oc logs -n openshift-etcd-operator etcd-operator-8668df65d-lvpjf -f
출력 예
I0927 08:58:41.249222 1 machinedeletionhooks.go:135] skip removing the deletion hook from machine test-day2-1-6qv96-master-2 since its member is still present with any of: [{InternalIP } {InternalIP 192.168.111.26}]
비정상적인
etcdctl
멤버를 수동으로 제거하십시오.$ oc rsh -n openshift-etcd etcd-worker-3\ etcdctl member list -w table
출력 예
+--------+---------+--------+--------------+--------------+---------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | LEARNER | +--------+---------+--------+--------------+--------------+---------+ |2c18942f| started |worker-3|192.168.111.26|192.168.111.26| false | |61e2a860| started |worker-2|192.168.111.25|192.168.111.25| false | |ead4f280| started |worker-5|192.168.111.28|192.168.111.28| false | +--------+---------+--------+--------------+--------------+---------+
etcdctl
에서 클러스터의 비정상 멤버를 보고하는지 확인합니다.$ etcdctl endpoint health
출력 예
{"level":"warn","ts":"2022-09-27T10:31:07.227Z","logger":"client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000d6e00/192.168.111.25","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 192.168.111.25: connect: no route to host\""} 192.168.111.28 is healthy: committed proposal: took = 13.038278ms 192.168.111.26 is healthy: committed proposal: took = 12.950355ms 192.168.111.25 is unhealthy: failed to commit proposal: context deadline exceeded Error: unhealthy cluster
etcdctl
멤버 사용자 정의 리소스를 삭제하여 비정상 클러스터를 제거합니다.$ etcdctl member remove 61e2a86084aafa62
출력 예
Member 61e2a86084aafa62 removed from cluster 6881c977b97990d7
다음 명령을 실행하여
etcdctl
의 멤버를 확인합니다.$ etcdctl member list -w table
출력 예
+----------+---------+--------+--------------+--------------+-------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |LEARNER| +----------+---------+--------+--------------+--------------+-------+ | 2c18942f | started |worker-3|192.168.111.26|192.168.111.26| false | | ead4f280 | started |worker-5|192.168.111.28|192.168.111.28| false | +----------+---------+--------+--------------+--------------+-------+
인증서 서명 요청 검토 및 승인
CSR(인증서 서명 요청)을 검토합니다.
$ oc get csr | grep Pending
출력 예
csr-5sd59 8m19s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending csr-xzqts 10s kubernetes.io/kubelet-serving system:node:worker-6 <none> Pending
보류 중인 모든 CSR을 승인합니다.
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
참고설치를 완료하려면 CSR을 승인해야 합니다.
컨트롤 플레인 노드의 준비 상태를 확인합니다.
$ oc get nodes
출력 예
NAME STATUS ROLES AGE VERSION worker-1 Ready worker 22h v1.24.0+3882f8f master-3 Ready master 22h v1.24.0+3882f8f worker-4 Ready worker 22h v1.24.0+3882f8f master-5 Ready master 17h v1.24.0+3882f8f master-6 Ready master 2m52s v1.24.0+3882f8f
시스템 ,
노드
BareMetalHost
사용자 정의 리소스를 검증합니다.etcd-operator
를 사용하려면 클러스터가 작동하는 머신 API로 실행되는 경우Machine
CR이 있어야 합니다.머신
CR은실행
중 단계에서 표시됩니다.BareMetalHost
및노드
와 연결된머신
사용자 정의 리소스를 생성합니다.새로 추가된 노드를 참조하는
머신
CR이 있는지 확인합니다.중요boot-it-yourself는
BareMetalHost
및Machine
CR을 생성하지 않으므로 생성해야 합니다.BareMetalHost
및Machine
CR을 생성하지 않으면etcd-operator
를 실행할 때 오류가 생성됩니다.BareMetalHost
사용자 정의 리소스를 추가합니다.$ oc create bmh -n openshift-machine-api custom-master3
머신
사용자 정의 리소스 추가:$ oc create machine -n openshift-machine-api custom-master3
link-machine-and-node.sh
스크립트를 실행하여BareMetalHost
,Machine
,Node
를 연결합니다.#!/bin/bash # Credit goes to https://bugzilla.redhat.com/show_bug.cgi?id=1801238. # This script will link Machine object and Node object. This is needed # in order to have IP address of the Node present in the status of the Machine. set -x set -e machine="$1" node="$2" if [ -z "$machine" -o -z "$node" ]; then echo "Usage: $0 MACHINE NODE" exit 1 fi uid=$(echo $node | cut -f1 -d':') node_name=$(echo $node | cut -f2 -d':') oc proxy & proxy_pid=$! function kill_proxy { kill $proxy_pid } trap kill_proxy EXIT SIGINT HOST_PROXY_API_PATH="http://localhost:8001/apis/metal3.io/v1alpha1/namespaces/openshift-machine-api/baremetalhosts" function wait_for_json() { local name local url local curl_opts local timeout local start_time local curr_time local time_diff name="$1" url="$2" timeout="$3" shift 3 curl_opts="$@" echo -n "Waiting for $name to respond" start_time=$(date +%s) until curl -g -X GET "$url" "${curl_opts[@]}" 2> /dev/null | jq '.' 2> /dev/null > /dev/null; do echo -n "." curr_time=$(date +%s) time_diff=$(($curr_time - $start_time)) if [[ $time_diff -gt $timeout ]]; then echo "\nTimed out waiting for $name" return 1 fi sleep 5 done echo " Success!" return 0 } wait_for_json oc_proxy "${HOST_PROXY_API_PATH}" 10 -H "Accept: application/json" -H "Content-Type: application/json" addresses=$(oc get node -n openshift-machine-api ${node_name} -o json | jq -c '.status.addresses') machine_data=$(oc get machine -n openshift-machine-api -o json ${machine}) host=$(echo "$machine_data" | jq '.metadata.annotations["metal3.io/BareMetalHost"]' | cut -f2 -d/ | sed 's/"//g') if [ -z "$host" ]; then echo "Machine $machine is not linked to a host yet." 1>&2 exit 1 fi # The address structure on the host doesn't match the node, so extract # the values we want into separate variables so we can build the patch # we need. hostname=$(echo "${addresses}" | jq '.[] | select(. | .type == "Hostname") | .address' | sed 's/"//g') ipaddr=$(echo "${addresses}" | jq '.[] | select(. | .type == "InternalIP") | .address' | sed 's/"//g') host_patch=' { "status": { "hardware": { "hostname": "'${hostname}'", "nics": [ { "ip": "'${ipaddr}'", "mac": "00:00:00:00:00:00", "model": "unknown", "speedGbps": 10, "vlanId": 0, "pxe": true, "name": "eth1" } ], "systemVendor": { "manufacturer": "Red Hat", "productName": "product name", "serialNumber": "" }, "firmware": { "bios": { "date": "04/01/2014", "vendor": "SeaBIOS", "version": "1.11.0-2.el7" } }, "ramMebibytes": 0, "storage": [], "cpu": { "arch": "x86_64", "model": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz", "clockMegahertz": 2199.998, "count": 4, "flags": [] } } } } ' echo "PATCHING HOST" echo "${host_patch}" | jq . curl -s \ -X PATCH \ ${HOST_PROXY_API_PATH}/${host}/status \ -H "Content-type: application/merge-patch+json" \ -d "${host_patch}" oc get baremetalhost -n openshift-machine-api -o yaml "${host}"
$ bash link-machine-and-node.sh custom-master3 worker-3
다음 명령을 실행하여
etcdctl
의 멤버를 확인합니다.$ oc rsh -n openshift-etcd etcd-worker-3 etcdctl member list -w table
출력 예
+---------+-------+--------+--------------+--------------+-------+ | ID | STATUS| NAME | PEER ADDRS | CLIENT ADDRS |LEARNER| +---------+-------+--------+--------------+--------------+-------+ | 2c18942f|started|worker-3|192.168.111.26|192.168.111.26| false | | ead4f280|started|worker-5|192.168.111.28|192.168.111.28| false | | 79153c5a|started|worker-6|192.168.111.29|192.168.111.29| false | +---------+-------+--------+--------------+--------------+-------+
etcd
Operator가 모든 노드를 구성했는지 확인합니다.$ oc get clusteroperator etcd
출력 예
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE etcd 4.11.5 True False False 22h
etcdctl
의 상태 확인 :$ oc rsh -n openshift-etcd etcd-worker-3 etcdctl endpoint health
출력 예
192.168.111.26 is healthy: committed proposal: took = 9.105375ms 192.168.111.28 is healthy: committed proposal: took = 9.15205ms 192.168.111.29 is healthy: committed proposal: took = 10.277577ms
노드의 상태를 확인합니다.
$ oc get Nodes
출력 예
NAME STATUS ROLES AGE VERSION worker-1 Ready worker 22h v1.24.0+3882f8f master-3 Ready master 22h v1.24.0+3882f8f worker-4 Ready worker 22h v1.24.0+3882f8f master-5 Ready master 18h v1.24.0+3882f8f master-6 Ready master 40m v1.24.0+3882f8f
ClusterOperators
의 상태를 확인합니다.$ oc get ClusterOperators
출력 예
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.11.5 True False False 150m baremetal 4.11.5 True False False 22h cloud-controller-manager 4.11.5 True False False 22h cloud-credential 4.11.5 True False False 22h cluster-autoscaler 4.11.5 True False False 22h config-operator 4.11.5 True False False 22h console 4.11.5 True False False 145m csi-snapshot-controller 4.11.5 True False False 22h dns 4.11.5 True False False 22h etcd 4.11.5 True False False 22h image-registry 4.11.5 True False False 22h ingress 4.11.5 True False False 22h insights 4.11.5 True False False 22h kube-apiserver 4.11.5 True False False 22h kube-controller-manager 4.11.5 True False False 22h kube-scheduler 4.11.5 True False False 22h kube-storage-version-migrator 4.11.5 True False False 148m machine-api 4.11.5 True False False 22h machine-approver 4.11.5 True False False 22h machine-config 4.11.5 True False False 110m marketplace 4.11.5 True False False 22h monitoring 4.11.5 True False False 22h network 4.11.5 True False False 22h node-tuning 4.11.5 True False False 22h openshift-apiserver 4.11.5 True False False 163m openshift-controller-manager 4.11.5 True False False 22h openshift-samples 4.11.5 True False False 22h operator-lifecycle-manager 4.11.5 True False False 22h operator-lifecycle-manager-catalog 4.11.5 True False False 22h operator-lifecycle-manager-pkgsvr 4.11.5 True False False 22h service-ca 4.11.5 True False False 22h storage 4.11.5 True False False 22h
ClusterVersion
확인 :$ oc get ClusterVersion
출력 예
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.5 True False 22h Cluster version is 4.11.5
추가 리소스