This documentation is for a release that is no longer maintained
See documentation for the latest supported version 3 or the latest supported version 4.이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Chapter 3. Restoring OpenShift Container Platform components
3.1. Overview
In OpenShift Container Platform, you can restore your cluster and its components by recreating cluster elements, including nodes and applications, from separate storage.
To restore a cluster, you must first back it up.
The following process describes a generic way of restoring applications and the OpenShift Container Platform cluster. It cannot take into account custom requirements. You might need to take additional actions to restore your cluster.
3.2. Restoring a cluster
To restore a cluster, first reinstall OpenShift Container Platform.
Procedure
- Reinstall OpenShift Container Platform in the same way that you originally installed OpenShift Container Platform.
- Run all of your custom post-installation steps, such as changing services outside of the control of OpenShift Container Platform or installing extra services like monitoring agents.
3.3. Restoring a master host backup
After creating a backup of important master host files, if they become corrupted or accidentally removed, you can restore the files by copying the files back to master, ensuring they contain the proper content, and restarting the affected services.
Procedure
Restore the
/etc/origin/master/master-config.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* cp /etc/origin/master/master-config.yaml /etc/origin/master/master-config.yaml.old cp /backup/$(hostname)/$(date +%Y%m%d)/origin/master/master-config.yaml /etc/origin/master/master-config.yaml systemctl restart atomic-openshift-master-api systemctl restart atomic-openshift-master-controllers
# MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* # cp /etc/origin/master/master-config.yaml /etc/origin/master/master-config.yaml.old # cp /backup/$(hostname)/$(date +%Y%m%d)/origin/master/master-config.yaml /etc/origin/master/master-config.yaml # systemctl restart atomic-openshift-master-api # systemctl restart atomic-openshift-master-controllers
WarningRestarting the master services can lead to downtime. However, you can remove the master host from the highly available load balancer pool, then perform the restore operation. Once the service has been properly restored, you can add the master host back to the load balancer pool.
NotePerform a full reboot of the affected instance to restore the
iptables
configuration.If you cannot restart OpenShift Container Platform because packages are missing, reinstall the packages.
Get the list of the current installed packages:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow rpm -qa | sort > /tmp/current_packages.txt
$ rpm -qa | sort > /tmp/current_packages.txt
View the differences between the package lists:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt ansible-2.4.0.0-5.el7.noarch
$ diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt > ansible-2.4.0.0-5.el7.noarch
Reinstall the missing packages:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow yum reinstall -y <packages>
# yum reinstall -y <packages>
1 - 1
- Replace
<packages>
with the packages that are different between the package lists.
Restore a system certificate by copying the certificate to the
/etc/pki/ca-trust/source/anchors/
directory and execute theupdate-ca-trust
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* sudo cp ${MYBACKUPDIR}/external_certificates/my_company.crt /etc/pki/ca-trust/source/anchors/ sudo update-ca-trust
$ MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* $ sudo cp ${MYBACKUPDIR}/external_certificates/my_company.crt /etc/pki/ca-trust/source/anchors/ $ sudo update-ca-trust
NoteAlways ensure the user ID and group ID are restored when the files are copied back, as well as the
SELinux
context.
3.4. Restoring a node host backup
After creating a backup of important node host files, if they become corrupted or accidentally removed, you can restore the file by copying back the file, ensuring it contains the proper content and restart the affected services.
Procedure
Restore the
/etc/origin/node/node-config.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) cp /etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml.old cp /backup/$(hostname)/$(date +%Y%m%d)/etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml systemctl restart atomic-openshift-node
# MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d) # cp /etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml.old # cp /backup/$(hostname)/$(date +%Y%m%d)/etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml # systemctl restart atomic-openshift-node
Restarting the services can lead to downtime. See Node maintenance, for tips on how to ease the process.
Perform a full reboot of the affected instance to restore the iptables
configuration.
If you cannot restart OpenShift Container Platform because packages are missing, reinstall the packages.
Get the list of the current installed packages:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow rpm -qa | sort > /tmp/current_packages.txt
$ rpm -qa | sort > /tmp/current_packages.txt
View the differences between the package lists:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt ansible-2.4.0.0-5.el7.noarch
$ diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt > ansible-2.4.0.0-5.el7.noarch
Reinstall the missing packages:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow yum reinstall -y <packages>
# yum reinstall -y <packages>
1 - 1
- Replace
<packages>
with the packages that are different between the package lists.
Restore a system certificate by copying the certificate to the
/etc/pki/ca-trust/source/anchors/
directory and execute theupdate-ca-trust
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* sudo cp ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/my_company.crt /etc/pki/ca-trust/source/anchors/ sudo update-ca-trust
$ MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)* $ sudo cp ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/my_company.crt /etc/pki/ca-trust/source/anchors/ $ sudo update-ca-trust
NoteAlways ensure proper user ID and group ID are restored when the files are copied back, as well as the
SELinux
context.
3.5. Restoring etcd
The restore procedure for etcd configuration files replaces the appropriate files, then restarts the service.
If an etcd host has become corrupted and the /etc/etcd/etcd.conf
file is lost, restore it using:
ssh master-0 cp /backup/yesterday/master-0-files/etcd.conf /etc/etcd/etcd.conf restorecon -Rv /etc/etcd/etcd.conf systemctl restart etcd.service
$ ssh master-0
# cp /backup/yesterday/master-0-files/etcd.conf /etc/etcd/etcd.conf
# restorecon -Rv /etc/etcd/etcd.conf
# systemctl restart etcd.service
In this example, the backup file is stored in the /backup/yesterday/master-0-files/etcd.conf
path where it can be used as an external NFS share, S3 bucket, or other storage solution.
3.5.1. Restoring etcd v2 & v3 data
The following process restores healthy data files and starts the etcd cluster as a single node, then adds the rest of the nodes if an etcd cluster is required.
Procedure
Stop all etcd services:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl stop etcd.service
# systemctl stop etcd.service
To ensure the proper backup is restored, delete the etcd directories:
To back up the current etcd data before you delete the directory, run the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow mv /var/lib/etcd /var/lib/etcd.old mkdir /var/lib/etcd chown -R etcd.etcd /var/lib/etcd/ restorecon -Rv /var/lib/etcd/
# mv /var/lib/etcd /var/lib/etcd.old # mkdir /var/lib/etcd # chown -R etcd.etcd /var/lib/etcd/ # restorecon -Rv /var/lib/etcd/
Or, to delete the directory and the etcd, data, run the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow rm -Rf /var/lib/etcd/*
# rm -Rf /var/lib/etcd/*
NoteIn an all-in-one cluster, the etcd data directory is located in the
/var/lib/origin/openshift.local.etcd
directory.
Restore a healthy backup data file to each of the etcd nodes. Perform this step on all etcd hosts, including master hosts collocated with etcd.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow cp -R /backup/etcd-xxx/* /var/lib/etcd/ mv /var/lib/etcd/db /var/lib/etcd/member/snap/db chcon -R --reference /backup/etcd-xxx/* /var/lib/etcd/ chown -R etcd:etcd /var/lib/etcd/R
# cp -R /backup/etcd-xxx/* /var/lib/etcd/ # mv /var/lib/etcd/db /var/lib/etcd/member/snap/db # chcon -R --reference /backup/etcd-xxx/* /var/lib/etcd/ # chown -R etcd:etcd /var/lib/etcd/R
Run the etcd service on each host, forcing a new cluster.
This creates a custom file for the etcd service, which overwrites the execution command adding the
--force-new-cluster
option:Copy to Clipboard Copied! Toggle word wrap Toggle overflow mkdir -p /etc/systemd/system/etcd.service.d/ echo "[Service]" > /etc/systemd/system/etcd.service.d/temp.conf echo "ExecStart=" >> /etc/systemd/system/etcd.service.d/temp.conf sed -n '/ExecStart/s/"$/ --force-new-cluster"/p' \ /usr/lib/systemd/system/etcd.service \ >> /etc/systemd/system/etcd.service.d/temp.conf systemctl daemon-reload systemctl restart etcd
# mkdir -p /etc/systemd/system/etcd.service.d/ # echo "[Service]" > /etc/systemd/system/etcd.service.d/temp.conf # echo "ExecStart=" >> /etc/systemd/system/etcd.service.d/temp.conf # sed -n '/ExecStart/s/"$/ --force-new-cluster"/p' \ /usr/lib/systemd/system/etcd.service \ >> /etc/systemd/system/etcd.service.d/temp.conf # systemctl daemon-reload # systemctl restart etcd
Check for error messages:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow journalctl -fu etcd.service
$ journalctl -fu etcd.service
Check for health status:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow etcdctl2 cluster-health
# etcdctl2 cluster-health member 5ee217d17301 is healthy: got healthy result from https://192.168.55.8:2379 cluster is healthy
Restart the etcd service in cluster mode:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow rm -f /etc/systemd/system/etcd.service.d/temp.conf systemctl daemon-reload systemctl restart etcd
# rm -f /etc/systemd/system/etcd.service.d/temp.conf # systemctl daemon-reload # systemctl restart etcd
Check for health status and member list:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow etcdctl2 cluster-health etcdctl2 member list
# etcdctl2 cluster-health member 5ee217d17301 is healthy: got healthy result from https://192.168.55.8:2379 cluster is healthy # etcdctl2 member list 5ee217d17301: name=master-0.example.com peerURLs=http://localhost:2380 clientURLs=https://192.168.55.8:2379 isLeader=true
- After the first instance is running, you can restore the rest of your etcd servers.
3.5.1.1. Fix the peerURLS
parameter
After restoring the data and creating a new cluster, the peerURLs
parameter shows localhost
instead of the IP where etcd is listening for peer communication:
etcdctl2 member list
# etcdctl2 member list
5ee217d17301: name=master-0.example.com peerURLs=http://*localhost*:2380 clientURLs=https://192.168.55.8:2379 isLeader=true
3.5.1.1.1. Procedure
Get the member ID using
etcdctl member list
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow `etcdctl member list`
`etcdctl member list`
Get the IP where etcd listens for peer communication:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ss -l4n | grep 2380
$ ss -l4n | grep 2380
Update the member information with that IP:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow etcdctl2 member update 5ee217d17301 https://192.168.55.8:2380
# etcdctl2 member update 5ee217d17301 https://192.168.55.8:2380 Updated member with ID 5ee217d17301 in cluster
To verify, check that the IP is in the member list:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow etcdctl2 member list
$ etcdctl2 member list 5ee217d17301: name=master-0.example.com peerURLs=https://*192.168.55.8*:2380 clientURLs=https://192.168.55.8:2379 isLeader=true
3.5.2. Restoring etcd for v3
The restore procedure for v3 data is similar to the restore procedure for the v2 data.
Snapshot integrity may be optionally verified at restore time. If the snapshot is taken with etcdctl snapshot save
, it will have an integrity hash that is checked by etcdctl snapshot restore
. If the snapshot is copied from the data directory, there is no integrity hash and it will only restore by using --skip-hash-check
.
The procedure to restore only the v3 data must be performed on a single etcd host. You can then add the rest of the nodes to the cluster.
Procedure
Stop all etcd services:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl stop etcd.service
# systemctl stop etcd.service
Clear all old data, because
etcdctl
recreates it in the node where the restore procedure is going to be performed:Copy to Clipboard Copied! Toggle word wrap Toggle overflow rm -Rf /var/lib/etcd
# rm -Rf /var/lib/etcd
Run the
snapshot restore
command, substituting the values from the/etc/etcd/etcd.conf
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow etcdctl3 snapshot restore /backup/etcd-xxxxxx/backup.db \ --data-dir /var/lib/etcd \ --name master-0.example.com \ --initial-cluster "master-0.example.com=https://192.168.55.8:2380" \ --initial-cluster-token "etcd-cluster-1" \ --initial-advertise-peer-urls https://192.168.55.8:2380
# etcdctl3 snapshot restore /backup/etcd-xxxxxx/backup.db \ --data-dir /var/lib/etcd \ --name master-0.example.com \ --initial-cluster "master-0.example.com=https://192.168.55.8:2380" \ --initial-cluster-token "etcd-cluster-1" \ --initial-advertise-peer-urls https://192.168.55.8:2380 2017-10-03 08:55:32.440779 I | mvcc: restore compact to 1041269 2017-10-03 08:55:32.468244 I | etcdserver/membership: added member 40bef1f6c79b3163 [https://192.168.55.8:2380] to cluster 26841ebcf610583c
Restore permissions and
selinux
context to the restored files:Copy to Clipboard Copied! Toggle word wrap Toggle overflow chown -R etcd.etcd /var/lib/etcd/ restorecon -Rv /var/lib/etcd
# chown -R etcd.etcd /var/lib/etcd/ # restorecon -Rv /var/lib/etcd
Start the etcd service:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl start etcd
# systemctl start etcd
Check for any error messages:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow journalctl -fu etcd.service
$ journalctl -fu etcd.service
3.6. Adding an etcd node
After you restore etcd, you can add more etcd nodes to the cluster. You can either add an etcd host by using an Ansible playbook or by manual steps.
3.6.1. Adding a new etcd host using Ansible
Procedure
In the Ansible inventory file, create a new group named
[new_etcd]
and add the new host. Then, add thenew_etcd
group as a child of the[OSEv3]
group:Copy to Clipboard Copied! Toggle word wrap Toggle overflow [OSEv3:children] masters nodes etcd new_etcd ... [OUTPUT ABBREVIATED] ... [etcd] master-0.example.com master-1.example.com master-2.example.com [new_etcd] etcd0.example.com
[OSEv3:children] masters nodes etcd new_etcd
1 ... [OUTPUT ABBREVIATED] ... [etcd] master-0.example.com master-1.example.com master-2.example.com [new_etcd]
2 etcd0.example.com
3 From the host that installed OpenShift Container Platform and hosts the Ansible inventory file, run the etcd
scaleup
playbook:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/scaleup.yml
$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/scaleup.yml
After the playbook runs, modify the inventory file to reflect the current status by moving the new etcd host from the
[new_etcd]
group to the[etcd]
group:Copy to Clipboard Copied! Toggle word wrap Toggle overflow [OSEv3:children] masters nodes etcd new_etcd ... [OUTPUT ABBREVIATED] ... [etcd] master-0.example.com master-1.example.com master-2.example.com etcd0.example.com
[OSEv3:children] masters nodes etcd new_etcd ... [OUTPUT ABBREVIATED] ... [etcd] master-0.example.com master-1.example.com master-2.example.com etcd0.example.com
If you use the service catalog, you must update its list of etcd servers:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc edit ds apiserver -n kube-service-catalog
$ oc edit ds apiserver -n kube-service-catalog
Add the FQDN for the new etcd node to the
--etcd-servers
argument. This argument contains a comma-separated list.If you use Flannel, modify the
flanneld
service configuration on every OpenShift Container Platform host, located at/etc/sysconfig/flanneld
, to include the new etcd host:Copy to Clipboard Copied! Toggle word wrap Toggle overflow FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
Restart the
flanneld
service:Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl restart flanneld.service
# systemctl restart flanneld.service
3.6.2. Manually adding a new etcd host
Procedure
Modify the current etcd cluster
To create the etcd certificates, run the openssl
command, replacing the values with those from your environment.
Create some environment variables:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow export NEW_ETCD_HOSTNAME="*etcd0.example.com*" export NEW_ETCD_IP="192.168.55.21" export CN=$NEW_ETCD_HOSTNAME export SAN="IP:${NEW_ETCD_IP}, DNS:${NEW_ETCD_HOSTNAME}" export PREFIX="/etc/etcd/generated_certs/etcd-$CN/" export OPENSSLCFG="/etc/etcd/ca/openssl.cnf"
export NEW_ETCD_HOSTNAME="*etcd0.example.com*" export NEW_ETCD_IP="192.168.55.21" export CN=$NEW_ETCD_HOSTNAME export SAN="IP:${NEW_ETCD_IP}, DNS:${NEW_ETCD_HOSTNAME}" export PREFIX="/etc/etcd/generated_certs/etcd-$CN/" export OPENSSLCFG="/etc/etcd/ca/openssl.cnf"
NoteThe custom
openssl
extensions used asetcd_v3_ca_*
include the $SAN environment variable assubjectAltName
. See/etc/etcd/ca/openssl.cnf
for more information.Create the directory to store the configuration and certificates:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow mkdir -p ${PREFIX}
# mkdir -p ${PREFIX}
Create the server certificate request and sign it: (server.csr and server.crt)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow openssl req -new -config ${OPENSSLCFG} \ -keyout ${PREFIX}server.key \ -out ${PREFIX}server.csr \ -reqexts etcd_v3_req -batch -nodes \ -subj /CN=$CN openssl ca -name etcd_ca -config ${OPENSSLCFG} \ -out ${PREFIX}server.crt \ -in ${PREFIX}server.csr \ -extensions etcd_v3_ca_server -batch
# openssl req -new -config ${OPENSSLCFG} \ -keyout ${PREFIX}server.key \ -out ${PREFIX}server.csr \ -reqexts etcd_v3_req -batch -nodes \ -subj /CN=$CN # openssl ca -name etcd_ca -config ${OPENSSLCFG} \ -out ${PREFIX}server.crt \ -in ${PREFIX}server.csr \ -extensions etcd_v3_ca_server -batch
Create the peer certificate request and sign it: (peer.csr and peer.crt)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow openssl req -new -config ${OPENSSLCFG} \ -keyout ${PREFIX}peer.key \ -out ${PREFIX}peer.csr \ -reqexts etcd_v3_req -batch -nodes \ -subj /CN=$CN openssl ca -name etcd_ca -config ${OPENSSLCFG} \ -out ${PREFIX}peer.crt \ -in ${PREFIX}peer.csr \ -extensions etcd_v3_ca_peer -batch
# openssl req -new -config ${OPENSSLCFG} \ -keyout ${PREFIX}peer.key \ -out ${PREFIX}peer.csr \ -reqexts etcd_v3_req -batch -nodes \ -subj /CN=$CN # openssl ca -name etcd_ca -config ${OPENSSLCFG} \ -out ${PREFIX}peer.crt \ -in ${PREFIX}peer.csr \ -extensions etcd_v3_ca_peer -batch
Copy the current etcd configuration and
ca.crt
files from the current node as examples to modify later:Copy to Clipboard Copied! Toggle word wrap Toggle overflow cp /etc/etcd/etcd.conf ${PREFIX} cp /etc/etcd/ca.crt ${PREFIX}
# cp /etc/etcd/etcd.conf ${PREFIX} # cp /etc/etcd/ca.crt ${PREFIX}
While still on the surviving etcd host, add the new host to the cluster. To add additional etcd members to the cluster, you must first adjust the default localhost peer in the
peerURLs
value for the first member:Get the member ID for the first member using the
member list
command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \ member list
# etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
1 member list
- 1
- Ensure that you specify the URLs of only active etcd members in the
--peers
parameter value.
Obtain the IP address where etcd listens for cluster peers:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ss -l4n | grep 2380
$ ss -l4n | grep 2380
Update the value of
peerURLs
using theetcdctl member update
command by passing the member ID and IP address obtained from the previous steps:Copy to Clipboard Copied! Toggle word wrap Toggle overflow etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \ member update 511b7fb6cc0001 https://172.18.1.18:2380
# etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \ member update 511b7fb6cc0001 https://172.18.1.18:2380
-
Re-run the
member list
command and ensure the peer URLs no longer include localhost.
Add the new host to the etcd cluster. Note that the new host is not yet configured, so the status stays as
unstarted
until the you configure the new host.WarningYou must add each member and bring it online one at a time. When you add each additional member to the cluster, you must adjust the
peerURLs
list for the current peers. ThepeerURLs
list grows by one for each member added. Theetcdctl member add
command outputs the values that you must set in the etcd.conf file as you add each member, as described in the following instructions.Copy to Clipboard Copied! Toggle word wrap Toggle overflow etcdctl -C https://${CURRENT_ETCD_HOST}:2379 \ --ca-file=/etc/etcd/ca.crt \ --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key member add ${NEW_ETCD_HOSTNAME} https://${NEW_ETCD_IP}:2380
# etcdctl -C https://${CURRENT_ETCD_HOST}:2379 \ --ca-file=/etc/etcd/ca.crt \ --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key member add ${NEW_ETCD_HOSTNAME} https://${NEW_ETCD_IP}:2380
1 Added member named 10.3.9.222 with ID 4e1db163a21d7651 to cluster ETCD_NAME="<NEW_ETCD_HOSTNAME>" ETCD_INITIAL_CLUSTER="<NEW_ETCD_HOSTNAME>=https://<NEW_HOST_IP>:2380,<CLUSTERMEMBER1_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER2_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER3_NAME>=https:/<CLUSTERMEMBER3_IP>:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
- 1
- In this line,
10.3.9.222
is a label for the etcd member. You can specify the host name, IP address, or a simple name.
Update the sample
${PREFIX}/etcd.conf
file.Replace the following values with the values generated in the previous step:
- ETCD_NAME
- ETCD_INITIAL_CLUSTER
- ETCD_INITIAL_CLUSTER_STATE
Modify the following variables with the new host IP from the output of the previous step. You can use
${NEW_ETCD_IP}
as the value.Copy to Clipboard Copied! Toggle word wrap Toggle overflow ETCD_LISTEN_PEER_URLS ETCD_LISTEN_CLIENT_URLS ETCD_INITIAL_ADVERTISE_PEER_URLS ETCD_ADVERTISE_CLIENT_URLS
ETCD_LISTEN_PEER_URLS ETCD_LISTEN_CLIENT_URLS ETCD_INITIAL_ADVERTISE_PEER_URLS ETCD_ADVERTISE_CLIENT_URLS
- If you previously used the member system as an etcd node, you must overwrite the current values in the /etc/etcd/etcd.conf file.
Check the file for syntax errors or missing IP addresses, otherwise the etcd service might fail:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow vi ${PREFIX}/etcd.conf
# vi ${PREFIX}/etcd.conf
-
On the node that hosts the installation files, update the
[etcd]
hosts group in the /etc/ansible/hosts inventory file. Remove the old etcd hosts and add the new ones. Create a
tgz
file that contains the certificates, the sample configuration file, and theca
and copy it to the new host:Copy to Clipboard Copied! Toggle word wrap Toggle overflow tar -czvf /etc/etcd/generated_certs/${CN}.tgz -C ${PREFIX} . scp /etc/etcd/generated_certs/${CN}.tgz ${CN}:/tmp/
# tar -czvf /etc/etcd/generated_certs/${CN}.tgz -C ${PREFIX} . # scp /etc/etcd/generated_certs/${CN}.tgz ${CN}:/tmp/
Modify the new etcd host
Install
iptables-services
to provide iptables utilities to open the required ports for etcd:Copy to Clipboard Copied! Toggle word wrap Toggle overflow yum install -y iptables-services
# yum install -y iptables-services
Create the
OS_FIREWALL_ALLOW
firewall rules to allow etcd to communicate:- Port 2379/tcp for clients
Port 2380/tcp for peer communication
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl enable iptables.service --now iptables -N OS_FIREWALL_ALLOW iptables -t filter -I INPUT -j OS_FIREWALL_ALLOW iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2379 -j ACCEPT iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2380 -j ACCEPT iptables-save | tee /etc/sysconfig/iptables
# systemctl enable iptables.service --now # iptables -N OS_FIREWALL_ALLOW # iptables -t filter -I INPUT -j OS_FIREWALL_ALLOW # iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2379 -j ACCEPT # iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2380 -j ACCEPT # iptables-save | tee /etc/sysconfig/iptables
NoteIn this example, a new chain
OS_FIREWALL_ALLOW
is created, which is the standard naming the OpenShift Container Platform installer uses for firewall rules.WarningIf the environment is hosted in an IaaS environment, modify the security groups for the instance to allow incoming traffic to those ports as well.
Install etcd:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow yum install -y etcd
# yum install -y etcd
Ensure version
etcd-2.3.7-4.el7.x86_64
or greater is installed,Ensure the etcd service is not running:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl disable etcd --now
# systemctl disable etcd --now
Remove any etcd configuration and data:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow rm -Rf /etc/etcd/* rm -Rf /var/lib/etcd/*
# rm -Rf /etc/etcd/* # rm -Rf /var/lib/etcd/*
Extract the certificates and configuration files:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow tar xzvf /tmp/etcd0.example.com.tgz -C /etc/etcd/
# tar xzvf /tmp/etcd0.example.com.tgz -C /etc/etcd/
Modify the file ownership permissions:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow chown -R etcd:etcd /etc/etcd/* chown -R etcd:etcd /var/lib/etcd/
# chown -R etcd:etcd /etc/etcd/* # chown -R etcd:etcd /var/lib/etcd/
Start etcd on the new host:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl enable etcd --now
# systemctl enable etcd --now
Verify that the host is part of the cluster and the current cluster health:
If you use the v2 etcd api, run the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers="https://*master-0.example.com*:2379,\ https://*master-1.example.com*:2379,\ https://*master-2.example.com*:2379,\ https://*etcd0.example.com*:2379"\ cluster-health
# etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers="https://*master-0.example.com*:2379,\ https://*master-1.example.com*:2379,\ https://*master-2.example.com*:2379,\ https://*etcd0.example.com*:2379"\ cluster-health member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379 member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379 member 8b8904727bf526a5 is healthy: got healthy result from https://192.168.55.21:2379 member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379 cluster is healthy
If you use the v3 etcd api, run the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ETCDCTL_API=3 etcdctl --cert="/etc/etcd/peer.crt" \ --key=/etc/etcd/peer.key \ --cacert="/etc/etcd/ca.crt" \ --endpoints="https://*master-0.example.com*:2379,\ https://*master-1.example.com*:2379,\ https://*master-2.example.com*:2379,\ https://*etcd0.example.com*:2379"\ endpoint health
# ETCDCTL_API=3 etcdctl --cert="/etc/etcd/peer.crt" \ --key=/etc/etcd/peer.key \ --cacert="/etc/etcd/ca.crt" \ --endpoints="https://*master-0.example.com*:2379,\ https://*master-1.example.com*:2379,\ https://*master-2.example.com*:2379,\ https://*etcd0.example.com*:2379"\ endpoint health https://master-0.example.com:2379 is healthy: successfully committed proposal: took = 5.011358ms https://master-1.example.com:2379 is healthy: successfully committed proposal: took = 1.305173ms https://master-2.example.com:2379 is healthy: successfully committed proposal: took = 1.388772ms https://etcd0.example.com:2379 is healthy: successfully committed proposal: took = 1.498829ms
Modify each OpenShift Container Platform master
Modify the master configuration in the
etcClientInfo
section of the/etc/origin/master/master-config.yaml
file on every master. Add the new etcd host to the list of the etcd servers OpenShift Container Platform uses to store the data, and remove any failed etcd hosts:Copy to Clipboard Copied! Toggle word wrap Toggle overflow etcdClientInfo: ca: master.etcd-ca.crt certFile: master.etcd-client.crt keyFile: master.etcd-client.key urls: - https://master-0.example.com:2379 - https://master-1.example.com:2379 - https://master-2.example.com:2379 - https://etcd0.example.com:2379
etcdClientInfo: ca: master.etcd-ca.crt certFile: master.etcd-client.crt keyFile: master.etcd-client.key urls: - https://master-0.example.com:2379 - https://master-1.example.com:2379 - https://master-2.example.com:2379 - https://etcd0.example.com:2379
Restart the master API service:
On every master:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl restart atomic-openshift-master-api
# systemctl restart atomic-openshift-master-api
Or, on a single master cluster installation:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl restart atomic-openshift-master
# systemctl restart atomic-openshift-master
WarningThe number of etcd nodes must be odd, so you must add at least two hosts.
If you use Flannel, modify the
flanneld
service configuration located at/etc/sysconfig/flanneld
on every OpenShift Container Platform host to include the new etcd host:Copy to Clipboard Copied! Toggle word wrap Toggle overflow FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
Restart the
flanneld
service:Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl restart flanneld.service
# systemctl restart flanneld.service
3.7. Bringing OpenShift Container Platform services back online
After you finish your changes, bring OpenShift Container Platform back online.
Procedure
On each OpenShift Container Platform master, restore your master and node configuration from backup and enable and restart all relevant services:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow cp ${MYBACKUPDIR}/etc/sysconfig/atomic-openshift-master-api /etc/sysconfig/atomic-openshift-master-api cp ${MYBACKUPDIR}/etc/sysconfig/atomic-openshift-master-controllers /etc/sysconfig/atomic-openshift-master-controllers cp ${MYBACKUPDIR}/etc/origin/master/master-config.yaml.<timestamp> /etc/origin/master/master-config.yaml cp ${MYBACKUPDIR}/etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml cp ${MYBACKUPDIR}/etc/origin/master/scheduler.json.<timestamp> /etc/origin/master/scheduler.json systemctl enable atomic-openshift-master-api systemctl enable atomic-openshift-master-controllers systemctl enable atomic-openshift-node systemctl start atomic-openshift-master-api systemctl start atomic-openshift-master-controllers systemctl start atomic-openshift-node
# cp ${MYBACKUPDIR}/etc/sysconfig/atomic-openshift-master-api /etc/sysconfig/atomic-openshift-master-api # cp ${MYBACKUPDIR}/etc/sysconfig/atomic-openshift-master-controllers /etc/sysconfig/atomic-openshift-master-controllers # cp ${MYBACKUPDIR}/etc/origin/master/master-config.yaml.<timestamp> /etc/origin/master/master-config.yaml # cp ${MYBACKUPDIR}/etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml # cp ${MYBACKUPDIR}/etc/origin/master/scheduler.json.<timestamp> /etc/origin/master/scheduler.json # systemctl enable atomic-openshift-master-api # systemctl enable atomic-openshift-master-controllers # systemctl enable atomic-openshift-node # systemctl start atomic-openshift-master-api # systemctl start atomic-openshift-master-controllers # systemctl start atomic-openshift-node
- On each OpenShift Container Platform node, restore your node-config.yaml file from backup and enable and restart the atomic-openshift-node service:
cp /etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml systemctl enable atomic-openshift-node systemctl start atomic-openshift-node
# cp /etc/origin/node/node-config.yaml.<timestamp> /etc/origin/node/node-config.yaml
# systemctl enable atomic-openshift-node
# systemctl start atomic-openshift-node
3.8. Restoring a project
To restore a project, create the new project, then restore any exported files by running oc create -f pods.json
. However, restoring a project from scratch requires a specific order because some objects depend on others. For example, you must create the configmaps
before you create any pods
.
Procedure
If the project was exported as a single file, import it by running the following commands:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc new-project <projectname> oc create -f project.yaml oc create -f secret.yaml oc create -f serviceaccount.yaml oc create -f pvc.yaml oc create -f rolebindings.yaml
$ oc new-project <projectname> $ oc create -f project.yaml $ oc create -f secret.yaml $ oc create -f serviceaccount.yaml $ oc create -f pvc.yaml $ oc create -f rolebindings.yaml
WarningSome resources, such as pods and default service accounts, can fail to be created.
3.9. Restoring application data
You can restore application data by using the oc rsync
command, assuming rsync
is installed within the container image. The Red Hat rhel7 base image contains rsync
. Therefore, all images that are based on rhel7 contain it as well. See Troubleshooting and Debugging CLI Operations - rsync.
This is a generic restoration of application data and does not take into account application-specific backup procedures, for example, special export and import procedures for database systems.
Other means of restoration might exist depending on the type of the persistent volume you use, for example, Cinder, NFS, or Gluster.
Procedure
Example of restoring a Jenkins deployment’s application data
Verify the backup:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ls -la /tmp/jenkins-backup/
$ ls -la /tmp/jenkins-backup/ total 8 drwxrwxr-x. 3 user user 20 Sep 6 11:14 . drwxrwxrwt. 17 root root 4096 Sep 6 11:16 .. drwxrwsrwx. 12 user user 4096 Sep 6 11:14 jenkins
Use the
oc rsync
tool to copy the data into the running pod:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc rsync /tmp/jenkins-backup/jenkins jenkins-1-37nux:/var/lib
$ oc rsync /tmp/jenkins-backup/jenkins jenkins-1-37nux:/var/lib
NoteDepending on the application, you may be required to restart the application.
Optionally, restart the application with new data:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete pod jenkins-1-37nux
$ oc delete pod jenkins-1-37nux
Alternatively, you can scale down the deployment to 0, and then up again:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc scale --replicas=0 dc/jenkins oc scale --replicas=1 dc/jenkins
$ oc scale --replicas=0 dc/jenkins $ oc scale --replicas=1 dc/jenkins
3.10. Restoring Persistent Volume Claims
This topic describes two methods for restoring data. The first involves deleting the file, then placing the file back in the expected location. The second example shows migrating persistent volume claims. The migration would occur in the event that the storage needs to be moved or in a disaster scenario when the backend storage no longer exists.
Check with the restore procedures for the specific application on any steps required to restore data to the application.
3.10.1. Restoring files to an existing PVC
Procedure
Delete the file:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc rsh demo-2-fxx6d
$ oc rsh demo-2-fxx6d sh-4.2$ ls */opt/app-root/src/uploaded/* lost+found ocp_sop.txt sh-4.2$ *rm -rf /opt/app-root/src/uploaded/ocp_sop.txt* sh-4.2$ *ls /opt/app-root/src/uploaded/* lost+found
Replace the file from the server that contains the rsync backup of the files that were in the pvc:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc rsync uploaded demo-2-fxx6d:/opt/app-root/src/
$ oc rsync uploaded demo-2-fxx6d:/opt/app-root/src/
Validate that the file is back on the pod by using
oc rsh
to connect to the pod and view the contents of the directory:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc rsh demo-2-fxx6d
$ oc rsh demo-2-fxx6d sh-4.2$ *ls /opt/app-root/src/uploaded/* lost+found ocp_sop.txt
3.10.2. Restoring data to a new PVC
The following steps assume that a new pvc
has been created.
Procedure
Overwrite the currently defined
claim-name
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc volume dc/demo --add --name=persistent-volume \ --type=persistentVolumeClaim --claim-name=filestore \ --mount-path=/opt/app-root/src/uploaded --overwrite
$ oc volume dc/demo --add --name=persistent-volume \ --type=persistentVolumeClaim --claim-name=filestore \ --mount-path=/opt/app-root/src/uploaded --overwrite
Validate that the pod is using the new PVC:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc describe dc/demo
$ oc describe dc/demo Name: demo Namespace: test Created: 3 hours ago Labels: app=demo Annotations: openshift.io/generated-by=OpenShiftNewApp Latest Version: 3 Selector: app=demo,deploymentconfig=demo Replicas: 1 Triggers: Config, Image(demo@latest, auto=true) Strategy: Rolling Template: Labels: app=demo deploymentconfig=demo Annotations: openshift.io/container.demo.image.entrypoint=["container-entrypoint","/bin/sh","-c","$STI_SCRIPTS_PATH/usage"] openshift.io/generated-by=OpenShiftNewApp Containers: demo: Image: docker-registry.default.svc:5000/test/demo@sha256:0a9f2487a0d95d51511e49d20dc9ff6f350436f935968b0c83fcb98a7a8c381a Port: 8080/TCP Volume Mounts: /opt/app-root/src/uploaded from persistent-volume (rw) Environment Variables: <none> Volumes: persistent-volume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) *ClaimName: filestore* ReadOnly: false ...omitted...
Now that the deployement configuration uses the new
pvc
, runoc rsync
to place the files onto the newpvc
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc rsync uploaded demo-3-2b8gs:/opt/app-root/src/
$ oc rsync uploaded demo-3-2b8gs:/opt/app-root/src/ sending incremental file list uploaded/ uploaded/ocp_sop.txt uploaded/lost+found/ sent 181 bytes received 39 bytes 146.67 bytes/sec total size is 32 speedup is 0.15
Validate that the file is back on the pod by using
oc rsh
to connect to the pod and view the contents of the directory:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc rsh demo-3-2b8gs
$ oc rsh demo-3-2b8gs sh-4.2$ ls /opt/app-root/src/uploaded/ lost+found ocp_sop.txt