Este contenido no está disponible en el idioma seleccionado.
Chapter 5. Host-level tasks
5.1. Adding a host to the cluster Copiar enlaceEnlace copiado en el portapapeles!
For information on adding master or node hosts to a cluster, see the Adding hosts to an existing cluster section in the Install and configuration guide.
5.2. Master host tasks Copiar enlaceEnlace copiado en el portapapeles!
5.2.1. Deprecating a master host Copiar enlaceEnlace copiado en el portapapeles!
Deprecating a master host removes it from the OpenShift Container Platform environment.
The reasons to deprecate or scale down master hosts include hardware re-sizing or replacing the underlying infrastructure.
Highly available OpenShift Container Platform environments require at least three master hosts and three etcd nodes. Usually, the master hosts are colocated with the etcd services. If you deprecate a master host, you also remove the etcd static pods from that host.
Ensure that the master and etcd services are always deployed in odd numbers due to the voting mechanisms that take place among those services.
5.2.1.1. Creating a master host backup Copiar enlaceEnlace copiado en el portapapeles!
Perform this backup process before any change to the OpenShift Container Platform infrastructure, such as a system update, upgrade, or any other significant modification. Back up data regularly to ensure that recent data is available if a failure occurs.
OpenShift Container Platform files
The master instances run important services, such as the API, controllers. The /etc/origin/master directory stores many important files:
- The configuration, the API, controllers, services, and more
- Certificates generated by the installation
- All cloud provider-related configuration
-
Keys and other authentication files, such as
htpasswdif you use htpasswd - And more
You can customize OpenShift Container Platform services, such as increasing the log level or using proxies. The configuration files are stored in the /etc/sysconfig directory.
Because the masters are also nodes, back up the entire /etc/origin directory.
Procedure
You must perform the following steps on each master node.
- Create a backup of the pod definitions, located here.
Create a backup of the master host configuration files:
MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc sudo cp -aR /etc/sysconfig/ ${MYBACKUPDIR}/etc/sysconfig/$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) $ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig $ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc $ sudo cp -aR /etc/sysconfig/ ${MYBACKUPDIR}/etc/sysconfig/Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe master configuration file is/etc/origin/master/master-config.yaml.
WarningThe
/etc/origin/master/ca.serial.txtfile is generated on only the first master listed in the Ansible host inventory. If you deprecate the first master host, copy the/etc/origin/master/ca.serial.txtfile to the rest of master hosts before the process.ImportantIn OpenShift Container Platform 3.11 clusters running multiple masters, one of the master nodes includes additional CA certificates in
/etc/origin/master,/etc/etcd/caand/etc/etcd/generated_certs. These are required for application node and etcd node scale-up operations and would need to be restored on another master node should the originating master become permanently unavailable. These directories are included by default within the backup procedures documented here.Other important files that need to be considered when planning a backup include:
Expand File
Description
/etc/cni/*Container Network Interface configuration (if used)
/etc/sysconfig/iptablesWhere the
iptablesrules are stored/etc/sysconfig/docker-storage-setupThe input file for
container-storage-setupcommand/etc/sysconfig/dockerThe
dockerconfiguration file/etc/sysconfig/docker-networkdockernetworking configuration (i.e. MTU)/etc/sysconfig/docker-storagedockerstorage configuration (generated bycontainer-storage-setup)/etc/dnsmasq.confMain configuration file for
dnsmasq/etc/dnsmasq.d/*Different
dnsmasqconfiguration files/etc/sysconfig/flanneldflannelconfiguration file (if used)/etc/pki/ca-trust/source/anchors/Certificates added to the system (i.e. for external registries)
Create a backup of those files:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If a package is accidentally removed or you need to resore a file that is included in an
rpmpackage, having a list ofrhelpackages installed on the system can be useful.NoteIf you use Red Hat Satellite features, such as content views or the facts store, provide a proper mechanism to reinstall the missing packages and a historical data of packages installed in the systems.
To create a list of the current
rhelpackages installed in the system:MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) sudo mkdir -p ${MYBACKUPDIR} rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) $ sudo mkdir -p ${MYBACKUPDIR} $ rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txtCopy to Clipboard Copied! Toggle word wrap Toggle overflow If you used the previous steps, the following files are present in the backup directory:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If needed, you can compress the files to save space:
MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR sudo rm -Rf ${MYBACKUPDIR}$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) $ sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR $ sudo rm -Rf ${MYBACKUPDIR}Copy to Clipboard Copied! Toggle word wrap Toggle overflow
To create any of these files from scratch, the openshift-ansible-contrib repository contains the backup_master_node.sh script, which performs the previous steps. The script creates a directory on the host where you run the script and copies all the files previously mentioned.
The openshift-ansible-contrib script is not supported by Red Hat, but the reference architecture team performs testing to ensure the code operates as defined and is secure.
You can run the script on every master host with:
mkdir ~/git cd ~/git git clone https://github.com/openshift/openshift-ansible-contrib.git cd openshift-ansible-contrib/reference-architecture/day2ops/scripts ./backup_master_node.sh -h
$ mkdir ~/git
$ cd ~/git
$ git clone https://github.com/openshift/openshift-ansible-contrib.git
$ cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
$ ./backup_master_node.sh -h
5.2.1.2. Backing up etcd Copiar enlaceEnlace copiado en el portapapeles!
When you back up etcd, you must back up both the etcd configuration files and the etcd data.
5.2.1.2.1. Backing up etcd configuration files Copiar enlaceEnlace copiado en el portapapeles!
The etcd configuration files to be preserved are all stored in the /etc/etcd directory of the instances where etcd is running. This includes the etcd configuration file (/etc/etcd/etcd.conf) and the required certificates for cluster communication. All those files are generated at installation time by the Ansible installer.
Procedure
For each etcd member of the cluster, back up the etcd configuration.
ssh master-0 mkdir -p /backup/etcd-config-$(date +%Y%m%d)/ cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/
$ ssh master-0
# mkdir -p /backup/etcd-config-$(date +%Y%m%d)/
# cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/
- 1
- Replace
master-0with the name of your etcd member.
The certificates and configuration files on each etcd cluster member are unique.
5.2.1.2.2. Backing up etcd data Copiar enlaceEnlace copiado en el portapapeles!
Prerequisites
The OpenShift Container Platform installer creates aliases to avoid typing all the flags named etcdctl2 for etcd v2 tasks and etcdctl3 for etcd v3 tasks.
However, the etcdctl3 alias does not provide the full endpoint list to the etcdctl command, so you must specify the --endpoints option and list all the endpoints.
Before backing up etcd:
-
etcdctlbinaries must be available or, in containerized installations, therhel7/etcdcontainer must be available. - Ensure that the OpenShift Container Platform API service is running.
- Ensure connectivity with the etcd cluster (port 2379/tcp).
- Ensure the proper certificates to connect to the etcd cluster.
Procedure
While the etcdctl backup command is used to perform the backup, etcd v3 has no concept of a backup. Instead, you either take a snapshot from a live member with the etcdctl snapshot save command or copy the member/snap/db file from an etcd data directory.
The etcdctl backup command rewrites some of the metadata contained in the backup, specifically, the node ID and cluster ID, which means that in the backup, the node loses its former identity. To recreate a cluster from the backup, you create a new, single-node cluster, then add the rest of the nodes to the cluster. The metadata is rewritten to prevent the new node from joining an existing cluster.
Back up the etcd data:
Clusters upgraded from previous versions of OpenShift Container Platform might contain v2 data stores. Back up all etcd data stores.
Obtain the etcd endpoint IP address from the static pod manifest:
export ETCD_POD_MANIFEST="/etc/origin/node/pods/etcd.yaml"
$ export ETCD_POD_MANIFEST="/etc/origin/node/pods/etcd.yaml"Copy to Clipboard Copied! Toggle word wrap Toggle overflow export ETCD_EP=$(grep https ${ETCD_POD_MANIFEST} | cut -d '/' -f3)$ export ETCD_EP=$(grep https ${ETCD_POD_MANIFEST} | cut -d '/' -f3)Copy to Clipboard Copied! Toggle word wrap Toggle overflow Log in as an administrator:
oc login -u system:admin
$ oc login -u system:adminCopy to Clipboard Copied! Toggle word wrap Toggle overflow Obtain the etcd pod name:
export ETCD_POD=$(oc get pods -n kube-system | grep -o -m 1 '^master-etcd\S*')
$ export ETCD_POD=$(oc get pods -n kube-system | grep -o -m 1 '^master-etcd\S*')Copy to Clipboard Copied! Toggle word wrap Toggle overflow Change to the
kube-systemproject:oc project kube-system
$ oc project kube-systemCopy to Clipboard Copied! Toggle word wrap Toggle overflow Take a snapshot of the etcd data in the pod and store it locally:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- You must write the snapshot to a directory under
/var/lib/etcd/.
5.2.1.3. Deprecating a master host Copiar enlaceEnlace copiado en el portapapeles!
Master hosts run important services, such as the OpenShift Container Platform API and controllers services. In order to deprecate a master host, these services must be stopped.
The OpenShift Container Platform API service is an active/active service, so stopping the service does not affect the environment as long as the requests are sent to a separate master server. However, the OpenShift Container Platform controllers service is an active/passive service, where the services use etcd to decide the active master.
Deprecating a master host in a multi-master architecture includes removing the master from the load balancer pool to avoid new connections attempting to use that master. This process depends heavily on the load balancer used. The steps below show the details of removing the master from haproxy. In the event that OpenShift Container Platform is running on a cloud provider, or using a F5 appliance, see the specific product documents to remove the master from rotation.
Procedure
Remove the
backendsection in the/etc/haproxy/haproxy.cfgconfiguration file. For example, if deprecating a master namedmaster-0.example.comusinghaproxy, ensure the host name is removed from the following:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Then, restart the
haproxyservice.sudo systemctl restart haproxy
$ sudo systemctl restart haproxyCopy to Clipboard Copied! Toggle word wrap Toggle overflow Once the master is removed from the load balancer, disable the API and controller services by moving definition files out of the static pods dir /etc/origin/node/pods:
mkdir -p /etc/origin/node/pods/disabled mv /etc/origin/node/pods/controller.yaml /etc/origin/node/pods/disabled/:
# mkdir -p /etc/origin/node/pods/disabled # mv /etc/origin/node/pods/controller.yaml /etc/origin/node/pods/disabled/: +Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Because the master host is a schedulable OpenShift Container Platform node, follow the steps in the Deprecating a node host section.
Remove the master host from the
[masters]and[nodes]groups in the/etc/ansible/hostsAnsible inventory file to avoid issues if running any Ansible tasks using that inventory file.WarningDeprecating the first master host listed in the Ansible inventory file requires extra precautions.
The
/etc/origin/master/ca.serial.txtfile is generated on only the first master listed in the Ansible host inventory. If you deprecate the first master host, copy the/etc/origin/master/ca.serial.txtfile to the rest of master hosts before the process.ImportantIn OpenShift Container Platform 3.11 clusters running multiple masters, one of the master nodes includes additional CA certificates in
/etc/origin/master,/etc/etcd/ca, and/etc/etcd/generated_certs. These are required for application node and etcd node scale-up operations and must be restored on another master node if the CA host master is being deprecated.The
kubernetesservice includes the master host IPs as endpoints. To verify that the master has been properly deprecated, review thekubernetesservice output and see if the deprecated master has been removed:Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the master has been successfully deprecated, the host where the master was previously running can be safely deleted.
5.2.1.4. Removing an etcd host Copiar enlaceEnlace copiado en el portapapeles!
If an etcd host fails beyond restoration, remove it from the cluster.
Steps to be performed on all masters hosts
Procedure
Remove each other etcd host from the etcd cluster. Run the following command for each etcd node:
etcdctl3 --endpoints=https://<surviving host IP>:2379
# etcdctl3 --endpoints=https://<surviving host IP>:2379 --cacert=/etc/etcd/ca.crt --cert=/etc/etcd/peer.crt --key=/etc/etcd/peer.key member remove <failed member ID>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the master API service on every master:
master-restart api restart-master controller
# master-restart api restart-master controllerCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Steps to be performed in the current etcd cluster
Procedure
Remove the failed host from the cluster:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
removecommand requires the etcd ID, not the hostname.
To ensure the etcd configuration does not use the failed host when the etcd service is restarted, modify the
/etc/etcd/etcd.conffile on all remaining etcd hosts and remove the failed host in the value for theETCD_INITIAL_CLUSTERvariable:vi /etc/etcd/etcd.conf
# vi /etc/etcd/etcd.confCopy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380,master-2.example.com=https://192.168.55.13:2380
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380,master-2.example.com=https://192.168.55.13:2380Copy to Clipboard Copied! Toggle word wrap Toggle overflow becomes:
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteRestarting the etcd services is not required, because the failed host is removed using
etcdctl.Modify the Ansible inventory file to reflect the current status of the cluster and to avoid issues when re-running a playbook:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you are using Flannel, modify the
flanneldservice configuration located at/etc/sysconfig/flanneldon every host and remove the etcd host:FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379
FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the
flanneldservice:systemctl restart flanneld.service
# systemctl restart flanneld.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow
5.2.2. Creating a master host backup Copiar enlaceEnlace copiado en el portapapeles!
Perform this backup process before any change to the OpenShift Container Platform infrastructure, such as a system update, upgrade, or any other significant modification. Back up data regularly to ensure that recent data is available if a failure occurs.
OpenShift Container Platform files
The master instances run important services, such as the API, controllers. The /etc/origin/master directory stores many important files:
- The configuration, the API, controllers, services, and more
- Certificates generated by the installation
- All cloud provider-related configuration
-
Keys and other authentication files, such as
htpasswdif you use htpasswd - And more
You can customize OpenShift Container Platform services, such as increasing the log level or using proxies. The configuration files are stored in the /etc/sysconfig directory.
Because the masters are also nodes, back up the entire /etc/origin directory.
Procedure
You must perform the following steps on each master node.
- Create a backup of the pod definitions, located here.
Create a backup of the master host configuration files:
MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc sudo cp -aR /etc/sysconfig/ ${MYBACKUPDIR}/etc/sysconfig/$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) $ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig $ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc $ sudo cp -aR /etc/sysconfig/ ${MYBACKUPDIR}/etc/sysconfig/Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe master configuration file is/etc/origin/master/master-config.yaml.
WarningThe
/etc/origin/master/ca.serial.txtfile is generated on only the first master listed in the Ansible host inventory. If you deprecate the first master host, copy the/etc/origin/master/ca.serial.txtfile to the rest of master hosts before the process.ImportantIn OpenShift Container Platform 3.11 clusters running multiple masters, one of the master nodes includes additional CA certificates in
/etc/origin/master,/etc/etcd/caand/etc/etcd/generated_certs. These are required for application node and etcd node scale-up operations and would need to be restored on another master node should the originating master become permanently unavailable. These directories are included by default within the backup procedures documented here.Other important files that need to be considered when planning a backup include:
Expand File
Description
/etc/cni/*Container Network Interface configuration (if used)
/etc/sysconfig/iptablesWhere the
iptablesrules are stored/etc/sysconfig/docker-storage-setupThe input file for
container-storage-setupcommand/etc/sysconfig/dockerThe
dockerconfiguration file/etc/sysconfig/docker-networkdockernetworking configuration (i.e. MTU)/etc/sysconfig/docker-storagedockerstorage configuration (generated bycontainer-storage-setup)/etc/dnsmasq.confMain configuration file for
dnsmasq/etc/dnsmasq.d/*Different
dnsmasqconfiguration files/etc/sysconfig/flanneldflannelconfiguration file (if used)/etc/pki/ca-trust/source/anchors/Certificates added to the system (i.e. for external registries)
Create a backup of those files:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If a package is accidentally removed or you need to resore a file that is included in an
rpmpackage, having a list ofrhelpackages installed on the system can be useful.NoteIf you use Red Hat Satellite features, such as content views or the facts store, provide a proper mechanism to reinstall the missing packages and a historical data of packages installed in the systems.
To create a list of the current
rhelpackages installed in the system:MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) sudo mkdir -p ${MYBACKUPDIR} rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) $ sudo mkdir -p ${MYBACKUPDIR} $ rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txtCopy to Clipboard Copied! Toggle word wrap Toggle overflow If you used the previous steps, the following files are present in the backup directory:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If needed, you can compress the files to save space:
MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR sudo rm -Rf ${MYBACKUPDIR}$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) $ sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR $ sudo rm -Rf ${MYBACKUPDIR}Copy to Clipboard Copied! Toggle word wrap Toggle overflow
To create any of these files from scratch, the openshift-ansible-contrib repository contains the backup_master_node.sh script, which performs the previous steps. The script creates a directory on the host where you run the script and copies all the files previously mentioned.
The openshift-ansible-contrib script is not supported by Red Hat, but the reference architecture team performs testing to ensure the code operates as defined and is secure.
You can run the script on every master host with:
mkdir ~/git cd ~/git git clone https://github.com/openshift/openshift-ansible-contrib.git cd openshift-ansible-contrib/reference-architecture/day2ops/scripts ./backup_master_node.sh -h
$ mkdir ~/git
$ cd ~/git
$ git clone https://github.com/openshift/openshift-ansible-contrib.git
$ cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
$ ./backup_master_node.sh -h
5.2.3. Restoring a master host backup Copiar enlaceEnlace copiado en el portapapeles!
After creating a backup of important master host files, if they become corrupted or accidentally removed, you can restore the files by copying the files back to master, ensuring they contain the proper content, and restarting the affected services.
Procedure
Restore the
/etc/origin/master/master-config.yamlfile:MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* cp /etc/origin/master/master-config.yaml /etc/origin/master/master-config.yaml.old cp /backup/$(hostname)/$(date +%Y%m%d)/origin/master/master-config.yaml /etc/origin/master/master-config.yaml master-restart api master-restart controllers
# MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* # cp /etc/origin/master/master-config.yaml /etc/origin/master/master-config.yaml.old # cp /backup/$(hostname)/$(date +%Y%m%d)/origin/master/master-config.yaml /etc/origin/master/master-config.yaml # master-restart api # master-restart controllersCopy to Clipboard Copied! Toggle word wrap Toggle overflow WarningRestarting the master services can lead to downtime. However, you can remove the master host from the highly available load balancer pool, then perform the restore operation. Once the service has been properly restored, you can add the master host back to the load balancer pool.
NotePerform a full reboot of the affected instance to restore the
iptablesconfiguration.If you cannot restart OpenShift Container Platform because packages are missing, reinstall the packages.
Get the list of the current installed packages:
rpm -qa | sort > /tmp/current_packages.txt
$ rpm -qa | sort > /tmp/current_packages.txtCopy to Clipboard Copied! Toggle word wrap Toggle overflow View the differences between the package lists:
diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt ansible-2.4.0.0-5.el7.noarch$ diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt > ansible-2.4.0.0-5.el7.noarchCopy to Clipboard Copied! Toggle word wrap Toggle overflow Reinstall the missing packages:
yum reinstall -y <packages>
# yum reinstall -y <packages>1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Replace
<packages>with the packages that are different between the package lists.
Restore a system certificate by copying the certificate to the
/etc/pki/ca-trust/source/anchors/directory and executeupdate-ca-trust:MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* sudo cp ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/<certificate> /etc/pki/ca-trust/source/anchors/ sudo update-ca-trust$ MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* $ sudo cp ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/<certificate> /etc/pki/ca-trust/source/anchors/1 $ sudo update-ca-trustCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Replace
<certificate>with the file name of the system certificate to restore.
NoteAlways ensure the user ID and group ID are restored when the files are copied back, as well as the
SELinuxcontext.
5.3. Node host tasks Copiar enlaceEnlace copiado en el portapapeles!
5.3.1. Deprecating a node host Copiar enlaceEnlace copiado en el portapapeles!
The procedure is the same whether deprecating an infrastructure node or an application node.
Prerequisites
Ensure enough capacity is available to migrate the existing pods from the node set to be removed. Removing an infrastructure node is advised only when at least two more nodes will stay online after the infrastructure node is removed.
Procedure
List all available nodes to find the node to deprecate:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow As an example, this topic deprecates the
ocp-infra-node-b7plinfrastructure node.Describe the node and its running services:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output above shows that the node is running two pods:
router-1-vzlzqanddocker-registry-1-5szjs. Two more infrastructure nodes are available to migrate these two pods.NoteThe cluster described above is a highly available cluster, this means both the
routeranddocker-registryservices are running on all infrastructure nodes.Mark a node as unschedulable and evacuate all of its pods:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the pod has attached local storage (for example,
EmptyDir), the--delete-local-dataoption must be provided. Generally, pods running in production should use the local storage only for temporary or cache files, but not for anything important or persistent. For regular storage, applications should use object storage or persistent volumes. In this case, thedocker-registrypod’s local storage is empty, because the object storage is used instead to store the container images.NoteThe above operation deletes existing pods running on the node. Then, new pods are created according to the replication controller.
In general, every application should be deployed with a deployment configuration, which creates pods using the replication controller.
oc adm drainwill not delete any bare pods (pods that are neither mirror pods nor managed byReplicationController,ReplicaSet,DaemonSet,StatefulSet, or a job). To do so, the--forceoption is required. Be aware that the bare pods will not be recreated on other nodes and data may be lost during this operation.The example below shows the output of the replication controller of the registry:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The event at the bottom of the output displays information about new pod creation. So, when listing all pods:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
The
docker-registry-1-5szjsandrouter-1-vzlzqpods that were running on the now deprecated node are no longer available. Instead, two new pods have been created:docker-registry-1-dprp5androuter-1-2gshr. As shown above, the new router pod isrouter-1-2gshr, but is in thePendingstate. This is because every node can be running only on one single router and is bound to the ports 80 and 443 of the host. When observing the newly created registry pod, the example below shows that the pod has been created on the
ocp-infra-node-rghbnode, which is different from the deprecating node:Copy to Clipboard Copied! Toggle word wrap Toggle overflow The only difference between deprecating the infrastructure and the application node is that once the infrastructure node is evacuated, and if there is no plan to replace that node, the services running on infrastructure nodes can be scaled down:
oc scale dc/router --replicas 2 oc scale dc/docker-registry --replicas 2
$ oc scale dc/router --replicas 2 deploymentconfig "router" scaled $ oc scale dc/docker-registry --replicas 2 deploymentconfig "docker-registry" scaledCopy to Clipboard Copied! Toggle word wrap Toggle overflow Now, every infrastructure node is running only one kind of each pod:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteTo provide a full highly available cluster, at least three infrastructure nodes should always be available.
To verify that the scheduling on the node is disabled:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow And that the node does not contain any pods:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the infrastructure instance from the
backendsection in the/etc/haproxy/haproxy.cfgconfiguration file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Then, restart the
haproxyservice.sudo systemctl restart haproxy
$ sudo systemctl restart haproxyCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the node from the cluster after all pods are evicted with command:
oc delete node ocp-infra-node-b7pl
$ oc delete node ocp-infra-node-b7pl node "ocp-infra-node-b7pl" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow
For more information on evacuating and draining pods or nodes, see Node maintenance section.
5.3.1.1. Replacing a node host Copiar enlaceEnlace copiado en el portapapeles!
In the event that a node would need to be added in place of the deprecated node, follow the Adding hosts to an existing cluster section.
5.3.2. Creating a node host backup Copiar enlaceEnlace copiado en el portapapeles!
Creating a backup of a node host is a different use case from backing up a master host. Because master hosts contain many important files, creating a backup is highly recommended. However, the nature of nodes is that anything special is replicated over the nodes in case of failover, and they typically do not contain data that is necessary to run an environment. If a backup of a node contains something necessary to run an environment, then a creating a backup is recommended.
The backup process is to be performed before any change to the infrastructure, such as a system update, upgrade, or any other significant modification. Backups should be performed on a regular basis to ensure the most recent data is available if a failure occurs.
OpenShift Container Platform files
Node instances run applications in the form of pods, which are based on containers. The /etc/origin/ and /etc/origin/node directories house important files, such as:
- The configuration of the node services
- Certificates generated by the installation
- Cloud provider-related configuration
-
Keys and other authentication files, such as the
dnsmasqconfiguration
The OpenShift Container Platform services can be customized to increase the log level, use proxies, and more, and the configuration files are stored in the /etc/sysconfig directory.
Procedure
Create a backup of the node configuration files:
MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc sudo cp -aR /etc/sysconfig/atomic-openshift-node ${MYBACKUPDIR}/etc/sysconfig/$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) $ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig $ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc $ sudo cp -aR /etc/sysconfig/atomic-openshift-node ${MYBACKUPDIR}/etc/sysconfig/Copy to Clipboard Copied! Toggle word wrap Toggle overflow OpenShift Container Platform uses specific files that must be taken into account when planning the backup policy, including:
Expand File
Description
/etc/cni/*Container Network Interface configuration (if used)
/etc/sysconfig/iptablesWhere the
iptablesrules are stored/etc/sysconfig/docker-storage-setupThe input file for
container-storage-setupcommand/etc/sysconfig/dockerThe
dockerconfiguration file/etc/sysconfig/docker-networkdockernetworking configuration (i.e. MTU)/etc/sysconfig/docker-storagedockerstorage configuration (generated bycontainer-storage-setup)/etc/dnsmasq.confMain configuration file for
dnsmasq/etc/dnsmasq.d/*Different
dnsmasqconfiguration files/etc/sysconfig/flanneldflannelconfiguration file (if used)/etc/pki/ca-trust/source/anchors/Certificates added to the system (i.e. for external registries)
To create those files:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If a package is accidentally removed, or a file included in an
rpmpackage should be restored, having a list ofrhelpackages installed on the system can be useful.NoteIf using Red Hat Satellite features, such as content views or the facts store, provide a proper mechanism to reinstall the missing packages and a historical data of packages installed in the systems.
To create a list of the current
rhelpackages installed in the system:MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) sudo mkdir -p ${MYBACKUPDIR} rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) $ sudo mkdir -p ${MYBACKUPDIR} $ rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txtCopy to Clipboard Copied! Toggle word wrap Toggle overflow The following files should now be present in the backup directory:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If needed, the files can be compressed to save space:
MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR sudo rm -Rf ${MYBACKUPDIR}$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) $ sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR $ sudo rm -Rf ${MYBACKUPDIR}Copy to Clipboard Copied! Toggle word wrap Toggle overflow
To create any of these files from scratch, the openshift-ansible-contrib repository contains the backup_master_node.sh script, which performs the previous steps. The script creates a directory on the host running the script and copies all the files previously mentioned.
The openshift-ansible-contrib script is not supported by Red Hat, but the reference architecture team performs testing to ensure the code operates as defined and is secure.
The script can be executed on every master host with:
mkdir ~/git cd ~/git git clone https://github.com/openshift/openshift-ansible-contrib.git cd openshift-ansible-contrib/reference-architecture/day2ops/scripts ./backup_master_node.sh -h
$ mkdir ~/git
$ cd ~/git
$ git clone https://github.com/openshift/openshift-ansible-contrib.git
$ cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
$ ./backup_master_node.sh -h
5.3.3. Restoring a node host backup Copiar enlaceEnlace copiado en el portapapeles!
After creating a backup of important node host files, if they become corrupted or accidentally removed, you can restore the file by copying back the file, ensuring it contains the proper content and restart the affected services.
Procedure
Restore the
/etc/origin/node/node-config.yamlfile:MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) cp /etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml.old cp /backup/$(hostname)/$(date +%Y%m%d)/etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml reboot
# MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) # cp /etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml.old # cp /backup/$(hostname)/$(date +%Y%m%d)/etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml # rebootCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Restarting the services can lead to downtime. See Node maintenance, for tips on how to ease the process.
Perform a full reboot of the affected instance to restore the iptables configuration.
If you cannot restart OpenShift Container Platform because packages are missing, reinstall the packages.
Get the list of the current installed packages:
rpm -qa | sort > /tmp/current_packages.txt
$ rpm -qa | sort > /tmp/current_packages.txtCopy to Clipboard Copied! Toggle word wrap Toggle overflow View the differences between the package lists:
diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt ansible-2.4.0.0-5.el7.noarch$ diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt > ansible-2.4.0.0-5.el7.noarchCopy to Clipboard Copied! Toggle word wrap Toggle overflow Reinstall the missing packages:
yum reinstall -y <packages>
# yum reinstall -y <packages>1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Replace
<packages>with the packages that are different between the package lists.
Restore a system certificate by copying the certificate to the
/etc/pki/ca-trust/source/anchors/directory and executeupdate-ca-trust:MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* sudo cp ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/<certificate> /etc/pki/ca-trust/source/anchors/ sudo update-ca-trust$ MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* $ sudo cp ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/<certificate> /etc/pki/ca-trust/source/anchors/ $ sudo update-ca-trustCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Replace
<certificate>with the file name of the system certificate to restore.
NoteAlways ensure proper user ID and group ID are restored when the files are copied back, as well as the
SELinuxcontext.
5.3.4. Node maintenance and next steps Copiar enlaceEnlace copiado en el portapapeles!
See Managing nodes or Managing pods topics for various node management options. These include:
A node can reserve a portion of its resources to be used by specific components. These include the kubelet, kube-proxy, Docker, or other remaining system components such as sshd and NetworkManager. See the Allocating node resources section in the Cluster Administrator guide for more information.
5.4. etcd tasks Copiar enlaceEnlace copiado en el portapapeles!
5.4.1. etcd backup Copiar enlaceEnlace copiado en el portapapeles!
etcd is the key value store for all object definitions, as well as the persistent master state. Other components watch for changes, then bring themselves into the desired state.
OpenShift Container Platform versions prior to 3.5 use etcd version 2 (v2), while 3.5 and later use version 3 (v3). The data model between the two versions of etcd is different. etcd v3 can use both the v2 and v3 data models, whereas etcd v2 can only use the v2 data model. In an etcd v3 server, the v2 and v3 data stores exist in parallel and are independent.
For both v2 and v3 operations, you can use the ETCDCTL_API environment variable to use the correct API:
See Migrating etcd Data (v2 to v3) section in the OpenShift Container Platform 3.7 documentation for information about how to migrate to v3.
In OpenShift Container Platform version 3.10 and later, you can either install etcd on separate hosts or run it as a static pod on your master hosts. If you do not specify separate etcd hosts, etcd runs as a static pod on master hosts. Because of this difference, the backup process is different if you use static pods.
The etcd backup process is composed of two different procedures:
- Configuration backup: Including the required etcd configuration and certificates
- Data backup: Including both v2 and v3 data model.
You can perform the data backup process on any host that has connectivity to the etcd cluster, where the proper certificates are provided, and where the etcdctl tool is installed.
The backup files must be copied to an external system, ideally outside the OpenShift Container Platform environment, and then encrypted.
Note that the etcd backup still has all the references to current storage volumes. When you restore etcd, OpenShift Container Platform starts launching the previous pods on nodes and reattaching the same storage. This process is no different than the process of when you remove a node from the cluster and add a new one back in its place. Anything attached to that node is reattached to the pods on whatever nodes they are rescheduled to.
5.4.1.1. Backing up etcd Copiar enlaceEnlace copiado en el portapapeles!
When you back up etcd, you must back up both the etcd configuration files and the etcd data.
5.4.1.1.1. Backing up etcd configuration files Copiar enlaceEnlace copiado en el portapapeles!
The etcd configuration files to be preserved are all stored in the /etc/etcd directory of the instances where etcd is running. This includes the etcd configuration file (/etc/etcd/etcd.conf) and the required certificates for cluster communication. All those files are generated at installation time by the Ansible installer.
Procedure
For each etcd member of the cluster, back up the etcd configuration.
ssh master-0 mkdir -p /backup/etcd-config-$(date +%Y%m%d)/ cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/
$ ssh master-0
# mkdir -p /backup/etcd-config-$(date +%Y%m%d)/
# cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/
- 1
- Replace
master-0with the name of your etcd member.
The certificates and configuration files on each etcd cluster member are unique.
5.4.1.1.2. Backing up etcd data Copiar enlaceEnlace copiado en el portapapeles!
Prerequisites
The OpenShift Container Platform installer creates aliases to avoid typing all the flags named etcdctl2 for etcd v2 tasks and etcdctl3 for etcd v3 tasks.
However, the etcdctl3 alias does not provide the full endpoint list to the etcdctl command, so you must specify the --endpoints option and list all the endpoints.
Before backing up etcd:
-
etcdctlbinaries must be available or, in containerized installations, therhel7/etcdcontainer must be available. - Ensure that the OpenShift Container Platform API service is running.
- Ensure connectivity with the etcd cluster (port 2379/tcp).
- Ensure the proper certificates to connect to the etcd cluster.
Procedure
While the etcdctl backup command is used to perform the backup, etcd v3 has no concept of a backup. Instead, you either take a snapshot from a live member with the etcdctl snapshot save command or copy the member/snap/db file from an etcd data directory.
The etcdctl backup command rewrites some of the metadata contained in the backup, specifically, the node ID and cluster ID, which means that in the backup, the node loses its former identity. To recreate a cluster from the backup, you create a new, single-node cluster, then add the rest of the nodes to the cluster. The metadata is rewritten to prevent the new node from joining an existing cluster.
Back up the etcd data:
Clusters upgraded from previous versions of OpenShift Container Platform might contain v2 data stores. Back up all etcd data stores.
Obtain the etcd endpoint IP address from the static pod manifest:
export ETCD_POD_MANIFEST="/etc/origin/node/pods/etcd.yaml"
$ export ETCD_POD_MANIFEST="/etc/origin/node/pods/etcd.yaml"Copy to Clipboard Copied! Toggle word wrap Toggle overflow export ETCD_EP=$(grep https ${ETCD_POD_MANIFEST} | cut -d '/' -f3)$ export ETCD_EP=$(grep https ${ETCD_POD_MANIFEST} | cut -d '/' -f3)Copy to Clipboard Copied! Toggle word wrap Toggle overflow Log in as an administrator:
oc login -u system:admin
$ oc login -u system:adminCopy to Clipboard Copied! Toggle word wrap Toggle overflow Obtain the etcd pod name:
export ETCD_POD=$(oc get pods -n kube-system | grep -o -m 1 '^master-etcd\S*')
$ export ETCD_POD=$(oc get pods -n kube-system | grep -o -m 1 '^master-etcd\S*')Copy to Clipboard Copied! Toggle word wrap Toggle overflow Change to the
kube-systemproject:oc project kube-system
$ oc project kube-systemCopy to Clipboard Copied! Toggle word wrap Toggle overflow Take a snapshot of the etcd data in the pod and store it locally:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- You must write the snapshot to a directory under
/var/lib/etcd/.
5.4.2. Restoring etcd Copiar enlaceEnlace copiado en el portapapeles!
5.4.2.1. Restoring the etcd configuration file Copiar enlaceEnlace copiado en el portapapeles!
If an etcd host has become corrupted and the /etc/etcd/etcd.conf file is lost, restore it using the following procedure:
Access your etcd host:
ssh master-0
$ ssh master-01 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Replace
master-0with the name of your etcd host.
Copy the backup
etcd.conffile to/etc/etcd/:cp /backup/etcd-config-<timestamp>/etcd/etcd.conf /etc/etcd/etcd.conf
# cp /backup/etcd-config-<timestamp>/etcd/etcd.conf /etc/etcd/etcd.confCopy to Clipboard Copied! Toggle word wrap Toggle overflow Set the required permissions and selinux context on the file:
restorecon -RvF /etc/etcd/etcd.conf
# restorecon -RvF /etc/etcd/etcd.confCopy to Clipboard Copied! Toggle word wrap Toggle overflow
In this example, the backup file is stored in the /backup/etcd-config-<timestamp>/etcd/etcd.conf path where it can be used as an external NFS share, S3 bucket, or other storage solution.
After the etcd configuration file is restored, you must restart the static pod. This is done after you restore the etcd data.
5.4.2.2. Restoring etcd data Copiar enlaceEnlace copiado en el portapapeles!
Before restoring etcd on a static pod:
etcdctlbinaries must be available or, in containerized installations, therhel7/etcdcontainer must be available.You can install the
etcdctlbinary with the etcd package by running the following commands:yum install etcd
# yum install etcdCopy to Clipboard Copied! Toggle word wrap Toggle overflow The package also installs the systemd service. Disable and mask the service so that it does not run as a systemd service when etcd runs in static pod. By disabling and masking the service, you ensure that you do not accidentally start it and prevent it from automatically restarting when you reboot the system.
systemctl disable etcd.service
# systemctl disable etcd.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl mask etcd.service
# systemctl mask etcd.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow
To restore etcd on a static pod:
If the pod is running, stop the etcd pod by moving the pod manifest YAML file to another directory:
mkdir -p /etc/origin/node/pods-stopped
# mkdir -p /etc/origin/node/pods-stoppedCopy to Clipboard Copied! Toggle word wrap Toggle overflow mv /etc/origin/node/pods/etcd.yaml /etc/origin/node/pods-stopped
# mv /etc/origin/node/pods/etcd.yaml /etc/origin/node/pods-stoppedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Move all old data:
mv /var/lib/etcd /var/lib/etcd.old
# mv /var/lib/etcd /var/lib/etcd.oldCopy to Clipboard Copied! Toggle word wrap Toggle overflow You use the etcdctl to recreate the data in the node where you restore the pod.
Restore the etcd snapshot to the mount path for the etcd pod:
export ETCDCTL_API=3
# export ETCDCTL_API=3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow Obtain the appropriate values for your cluster from your backup etcd.conf file.
Set required permissions and selinux context on the data directory:
restorecon -RvF /var/lib/etcd/
# restorecon -RvF /var/lib/etcd/Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the etcd pod by moving the pod manifest YAML file to the required directory:
mv /etc/origin/node/pods-stopped/etcd.yaml /etc/origin/node/pods/
# mv /etc/origin/node/pods-stopped/etcd.yaml /etc/origin/node/pods/Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.4.3. Replacing an etcd host Copiar enlaceEnlace copiado en el portapapeles!
To replace an etcd host, scale up the etcd cluster and then remove the host. This process ensures that you keep quorum if you lose an etcd host during the replacement procedure.
The etcd cluster must maintain a quorum during the replacement operation. This means that at least one host must be in operation at all times.
If the host replacement operation occurs while the etcd cluster maintains a quorum, cluster operations are usually not affected. If a large amount of etcd data must replicate, some operations might slow down.
Before you start any procedure involving the etcd cluster, you must have a backup of the etcd data and configuration files so that you can restore the cluster if the procedure fails.
5.4.4. Scaling etcd Copiar enlaceEnlace copiado en el portapapeles!
You can scale the etcd cluster vertically by adding more resources to the etcd hosts or horizontally by adding more etcd hosts.
Due to the voting system etcd uses, the cluster must always contain an odd number of members.
Having a cluster with an odd number of etcd hosts can account for fault tolerance. Having an odd number of etcd hosts does not change the number needed for a quorum but increases the tolerance for failure. For example, with a cluster of three members, quorum is two, which leaves a failure tolerance of one. This ensures the cluster continues to operate if two of the members are healthy.
Having an in-production cluster of three etcd hosts is recommended.
The new host requires a fresh Red Hat Enterprise Linux version 7 dedicated host. The etcd storage should be located on an SSD disk to achieve maximum performance and on a dedicated disk mounted in /var/lib/etcd.
Prerequisites
- Before you add a new etcd host, perform a backup of both etcd configuration and data to prevent data loss.
Check the current etcd cluster status to avoid adding new hosts to an unhealthy cluster. Run this command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Before running the
scaleupplaybook, ensure the new host is registered to the proper Red Hat software channels:Copy to Clipboard Copied! Toggle word wrap Toggle overflow etcd is hosted in the
rhel-7-server-extras-rpmssoftware channel.Make sure all unused etcd members are removed from the etcd cluster. This must be completed before running the
scaleupplaybook.List the etcd members:
etcdctl --cert="/etc/etcd/peer.crt" --key="/etc/etcd/peer.key" \ --cacert="/etc/etcd/ca.crt" --endpoints=ETCD_LISTEN_CLIENT_URLS member list -w table
# etcdctl --cert="/etc/etcd/peer.crt" --key="/etc/etcd/peer.key" \ --cacert="/etc/etcd/ca.crt" --endpoints=ETCD_LISTEN_CLIENT_URLS member list -w tableCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the unused etcd member ID, if applicable.
Remove the unused member by specifying its ID in the following command:
etcdctl --cert="/etc/etcd/peer.crt" --key="/etc/etcd/peer.key" \ --cacert="/etc/etcd/ca.crt" --endpoints=ETCD_LISTEN_CLIENT_URL member remove UNUSED_ETCD_MEMBER_ID
# etcdctl --cert="/etc/etcd/peer.crt" --key="/etc/etcd/peer.key" \ --cacert="/etc/etcd/ca.crt" --endpoints=ETCD_LISTEN_CLIENT_URL member remove UNUSED_ETCD_MEMBER_IDCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Upgrade etcd and iptables on the current etcd nodes:
yum update etcd iptables-services
# yum update etcd iptables-servicesCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Back up the /etc/etcd configuration for the etcd hosts.
- If the new etcd members will also be OpenShift Container Platform nodes, add the desired number of hosts to the cluster.
- The rest of this procedure assumes you added one host, but if you add multiple hosts, perform all steps on each host.
5.4.4.1. Adding a new etcd host using Ansible Copiar enlaceEnlace copiado en el portapapeles!
Procedure
In the Ansible inventory file, create a new group named
[new_etcd]and add the new host. Then, add thenew_etcdgroup as a child of the[OSEv3]group:Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteReplace the old
etcd hostentry with the newetcd hostentry in the inventory file. While replacing the olderetcd host, you must create a copy of/etc/etcd/ca/directory. Alternatively, you can redeploy etcd ca and certs before scaling up theetcd hosts.From the host that installed OpenShift Container Platform and hosts the Ansible inventory file, change to the playbook directory and run the etcd
scaleupplaybook:cd /usr/share/ansible/openshift-ansible ansible-playbook playbooks/openshift-etcd/scaleup.yml
$ cd /usr/share/ansible/openshift-ansible $ ansible-playbook playbooks/openshift-etcd/scaleup.ymlCopy to Clipboard Copied! Toggle word wrap Toggle overflow After the playbook runs, modify the inventory file to reflect the current status by moving the new etcd host from the
[new_etcd]group to the[etcd]group:Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you use Flannel, modify the
flanneldservice configuration on every OpenShift Container Platform host, located at/etc/sysconfig/flanneld, to include the new etcd host:FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the
flanneldservice:systemctl restart flanneld.service
# systemctl restart flanneld.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow
5.4.4.2. Manually adding a new etcd host Copiar enlaceEnlace copiado en el portapapeles!
If you do not run etcd as static pods on master nodes, you might need to add another etcd host.
Procedure
Modify the current etcd cluster
To create the etcd certificates, run the openssl command, replacing the values with those from your environment.
Create some environment variables:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe custom
opensslextensions used asetcd_v3_ca_*include the $SAN environment variable assubjectAltName. See/etc/etcd/ca/openssl.cnffor more information.Create the directory to store the configuration and certificates:
mkdir -p ${PREFIX}# mkdir -p ${PREFIX}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the server certificate request and sign it: (server.csr and server.crt)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the peer certificate request and sign it: (peer.csr and peer.crt)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the current etcd configuration and
ca.crtfiles from the current node as examples to modify later:cp /etc/etcd/etcd.conf ${PREFIX} cp /etc/etcd/ca.crt ${PREFIX}# cp /etc/etcd/etcd.conf ${PREFIX} # cp /etc/etcd/ca.crt ${PREFIX}Copy to Clipboard Copied! Toggle word wrap Toggle overflow While still on the surviving etcd host, add the new host to the cluster. To add additional etcd members to the cluster, you must first adjust the default localhost peer in the
peerURLsvalue for the first member:Get the member ID for the first member using the
member listcommand:etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \ member list# etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \1 member listCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Ensure that you specify the URLs of only active etcd members in the
--peersparameter value.
Obtain the IP address where etcd listens for cluster peers:
ss -l4n | grep 2380
$ ss -l4n | grep 2380Copy to Clipboard Copied! Toggle word wrap Toggle overflow Update the value of
peerURLsusing theetcdctl member updatecommand by passing the member ID and IP address obtained from the previous steps:etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \ member update 511b7fb6cc0001 https://172.18.1.18:2380# etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \ member update 511b7fb6cc0001 https://172.18.1.18:2380Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Re-run the
member listcommand and ensure the peer URLs no longer include localhost.
Add the new host to the etcd cluster. Note that the new host is not yet configured, so the status stays as
unstarteduntil the you configure the new host.WarningYou must add each member and bring it online one at a time. When you add each additional member to the cluster, you must adjust the
peerURLslist for the current peers. ThepeerURLslist grows by one for each member added. Theetcdctl member addcommand outputs the values that you must set in the etcd.conf file as you add each member, as described in the following instructions.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- In this line,
10.3.9.222is a label for the etcd member. You can specify the host name, IP address, or a simple name.
Update the sample
${PREFIX}/etcd.conffile.Replace the following values with the values generated in the previous step:
- ETCD_NAME
- ETCD_INITIAL_CLUSTER
- ETCD_INITIAL_CLUSTER_STATE
Modify the following variables with the new host IP from the output of the previous step. You can use
${NEW_ETCD_IP}as the value.ETCD_LISTEN_PEER_URLS ETCD_LISTEN_CLIENT_URLS ETCD_INITIAL_ADVERTISE_PEER_URLS ETCD_ADVERTISE_CLIENT_URLS
ETCD_LISTEN_PEER_URLS ETCD_LISTEN_CLIENT_URLS ETCD_INITIAL_ADVERTISE_PEER_URLS ETCD_ADVERTISE_CLIENT_URLSCopy to Clipboard Copied! Toggle word wrap Toggle overflow - If you previously used the member system as an etcd node, you must overwrite the current values in the /etc/etcd/etcd.conf file.
Check the file for syntax errors or missing IP addresses, otherwise the etcd service might fail:
vi ${PREFIX}/etcd.conf# vi ${PREFIX}/etcd.confCopy to Clipboard Copied! Toggle word wrap Toggle overflow
-
On the node that hosts the installation files, update the
[etcd]hosts group in the /etc/ansible/hosts inventory file. Remove the old etcd hosts and add the new ones. Create a
tgzfile that contains the certificates, the sample configuration file, and thecaand copy it to the new host:tar -czvf /etc/etcd/generated_certs/${CN}.tgz -C ${PREFIX} . scp /etc/etcd/generated_certs/${CN}.tgz ${CN}:/tmp/# tar -czvf /etc/etcd/generated_certs/${CN}.tgz -C ${PREFIX} . # scp /etc/etcd/generated_certs/${CN}.tgz ${CN}:/tmp/Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Modify the new etcd host
Install
iptables-servicesto provide iptables utilities to open the required ports for etcd:yum install -y iptables-services
# yum install -y iptables-servicesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
OS_FIREWALL_ALLOWfirewall rules to allow etcd to communicate:- Port 2379/tcp for clients
Port 2380/tcp for peer communication
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIn this example, a new chain
OS_FIREWALL_ALLOWis created, which is the standard naming the OpenShift Container Platform installer uses for firewall rules.WarningIf the environment is hosted in an IaaS environment, modify the security groups for the instance to allow incoming traffic to those ports as well.
Install etcd:
yum install -y etcd
# yum install -y etcdCopy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure version
etcd-2.3.7-4.el7.x86_64or greater is installed,Ensure the etcd service is not running by removing the etcd pod definition:
mkdir -p /etc/origin/node/pods-stopped mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/
# mkdir -p /etc/origin/node/pods-stopped # mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove any etcd configuration and data:
rm -Rf /etc/etcd/* rm -Rf /var/lib/etcd/*
# rm -Rf /etc/etcd/* # rm -Rf /var/lib/etcd/*Copy to Clipboard Copied! Toggle word wrap Toggle overflow Extract the certificates and configuration files:
tar xzvf /tmp/etcd0.example.com.tgz -C /etc/etcd/
# tar xzvf /tmp/etcd0.example.com.tgz -C /etc/etcd/Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start etcd on the new host:
systemctl enable etcd --now
# systemctl enable etcd --nowCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the host is part of the cluster and the current cluster health:
If you use the v2 etcd api, run the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you use the v3 etcd api, run the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Modify each OpenShift Container Platform master
Modify the master configuration in the
etcClientInfosection of the/etc/origin/master/master-config.yamlfile on every master. Add the new etcd host to the list of the etcd servers OpenShift Container Platform uses to store the data, and remove any failed etcd hosts:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the master API service:
On every master:
master-restart api master-restart controllers
# master-restart api # master-restart controllersCopy to Clipboard Copied! Toggle word wrap Toggle overflow WarningThe number of etcd nodes must be odd, so you must add at least two hosts.
If you use Flannel, modify the
flanneldservice configuration located at/etc/sysconfig/flanneldon every OpenShift Container Platform host to include the new etcd host:FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the
flanneldservice:systemctl restart flanneld.service
# systemctl restart flanneld.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow
5.4.5. Removing an etcd host Copiar enlaceEnlace copiado en el portapapeles!
If an etcd host fails beyond restoration, remove it from the cluster.
Steps to be performed on all masters hosts
Procedure
Remove each other etcd host from the etcd cluster. Run the following command for each etcd node:
etcdctl3 --endpoints=https://<surviving host IP>:2379
# etcdctl3 --endpoints=https://<surviving host IP>:2379 --cacert=/etc/etcd/ca.crt --cert=/etc/etcd/peer.crt --key=/etc/etcd/peer.key member remove <failed member ID>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the master API service on every master:
master-restart api restart-master controller
# master-restart api restart-master controllerCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Steps to be performed in the current etcd cluster
Procedure
Remove the failed host from the cluster:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
removecommand requires the etcd ID, not the hostname.
To ensure the etcd configuration does not use the failed host when the etcd service is restarted, modify the
/etc/etcd/etcd.conffile on all remaining etcd hosts and remove the failed host in the value for theETCD_INITIAL_CLUSTERvariable:vi /etc/etcd/etcd.conf
# vi /etc/etcd/etcd.confCopy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380,master-2.example.com=https://192.168.55.13:2380
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380,master-2.example.com=https://192.168.55.13:2380Copy to Clipboard Copied! Toggle word wrap Toggle overflow becomes:
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteRestarting the etcd services is not required, because the failed host is removed using
etcdctl.Modify the Ansible inventory file to reflect the current status of the cluster and to avoid issues when re-running a playbook:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you are using Flannel, modify the
flanneldservice configuration located at/etc/sysconfig/flanneldon every host and remove the etcd host:FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379
FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the
flanneldservice:systemctl restart flanneld.service
# systemctl restart flanneld.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow