Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 5. Host-level tasks

5.1. Adding a host to the cluster
Link kopieren

For information on adding master or node hosts to a cluster, see the Adding hosts to an existing cluster section in the Install and configuration guide.

5.2. Master host tasks
Link kopieren

5.2.1. Deprecating a master host
Link kopieren

Deprecating a master host removes it from the OpenShift Container Platform environment.

The reasons to deprecate or scale down master hosts include hardware re-sizing or replacing the underlying infrastructure.

Highly available OpenShift Container Platform environments require at least three master hosts and three etcd nodes. Usually, the master hosts are colocated with the etcd services. If you deprecate a master host, you must also deprecate the etcd service on that host.

Important

Ensure that the master and etcd services are always deployed in odd numbers due to the voting mechanisms that take place among those services.

5.2.1.1. Creating a master host backup
Link kopieren

Perform this backup process before any change to the OpenShift Container Platform infrastructure, such as a system update, upgrade, or any other significant modification. Back up data regularly to ensure that recent data is available if a failure occurs.

OpenShift Container Platform files

The master instances run important services, such as the API, controllers. The /etc/origin/master directory stores many important files:

The configuration, the API, controllers, services, and more
Certificates generated by the installation
All cloud provider-related configuration
Keys and other authentication files, such as htpasswd if you use htpasswd
And more

You can customize OpenShift Container Platform services, such as increasing the log level or using proxies. The configuration files are stored in the /etc/sysconfig directory.

Because the masters are also nodes, back up the entire /etc/origin directory.

Procedure

Important

You must perform the following steps on each master node.

Create a backup of the master host configuration files:
```
MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
sudo cp -aR /etc/sysconfig/atomic-* ${MYBACKUPDIR}/etc/sysconfig/
```
```
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
$ sudo cp -aR /etc/sysconfig/atomic-* ${MYBACKUPDIR}/etc/sysconfig/
```
Copy to Clipboard Toggle word wrap
Note
The configuration file is stored in the /etc/sysconfig/atomic-openshift-master-api, and /etc/sysconfig/atomic-openshift-master-controllers directories.
Warning
The /etc/origin/master/ca.serial.txt file is generated on only the first master listed in the Ansible host inventory. If you deprecate the first master host, copy the /etc/origin/master/ca.serial.txt file to the rest of master hosts before the process.

Other important files that need to be considered when planning a backup include:

Expand

File	Description
`/etc/cni/*`	Container Network Interface configuration (if used)
`/etc/sysconfig/iptables`	Where the `iptables` rules are stored
`/etc/sysconfig/docker-storage-setup`	The input file for `container-storage-setup` command
`/etc/sysconfig/docker`	The `docker` configuration file
`/etc/sysconfig/docker-network`	`docker` networking configuration (i.e. MTU)
`/etc/sysconfig/docker-storage`	`docker` storage configuration (generated by `container-storage-setup`)
`/etc/dnsmasq.conf`	Main configuration file for `dnsmasq`
`/etc/dnsmasq.d/*`	Different `dnsmasq` configuration files
`/etc/sysconfig/flanneld`	`flannel` configuration file (if used)
`/etc/pki/ca-trust/source/anchors/`	Certificates added to the system (i.e. for external registries)

Create a backup of those files:

MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
sudo mkdir -p ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors
sudo cp -aR /etc/sysconfig/{iptables,docker-*,flanneld} \
    ${MYBACKUPDIR}/etc/sysconfig/
sudo cp -aR /etc/dnsmasq* /etc/cni ${MYBACKUPDIR}/etc/
sudo cp -aR /etc/pki/ca-trust/source/anchors/* \
    ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/

$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo mkdir -p ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors
$ sudo cp -aR /etc/sysconfig/{iptables,docker-*,flanneld} \
    ${MYBACKUPDIR}/etc/sysconfig/
$ sudo cp -aR /etc/dnsmasq* /etc/cni ${MYBACKUPDIR}/etc/
$ sudo cp -aR /etc/pki/ca-trust/source/anchors/* \
    ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/

Copy to Clipboard

Toggle word wrap

If a package is accidentally removed or you need to resore a file that is included in an rpm package, having a list of rhel packages installed on the system can be useful.
Note
If you use Red Hat Satellite features, such as content views or the facts store, provide a proper mechanism to reinstall the missing packages and a historical data of packages installed in the systems.
To create a list of the current rhel packages installed in the system:
```
MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo mkdir -p ${MYBACKUPDIR}
rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt
```
```
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}
$ rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt
```
Copy to Clipboard Toggle word wrap

If you used the previous steps, the following files are present in the backup directory:

MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo find ${MYBACKUPDIR} -mindepth 1 -type f -printf '%P\n'

$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo find ${MYBACKUPDIR} -mindepth 1 -type f -printf '%P\n'
etc/sysconfig/atomic-openshift-master
etc/sysconfig/atomic-openshift-master-api
etc/sysconfig/atomic-openshift-master-controllers
etc/sysconfig/atomic-openshift-node
etc/sysconfig/flanneld
etc/sysconfig/iptables
etc/sysconfig/docker-network
etc/sysconfig/docker-storage
etc/sysconfig/docker-storage-setup
etc/sysconfig/docker-storage-setup.rpmnew
etc/origin/master/ca.crt
etc/origin/master/ca.key
etc/origin/master/ca.serial.txt
etc/origin/master/ca-bundle.crt
etc/origin/master/master.proxy-client.crt
etc/origin/master/master.proxy-client.key
etc/origin/master/service-signer.crt
etc/origin/master/service-signer.key
etc/origin/master/serviceaccounts.private.key
etc/origin/master/serviceaccounts.public.key
etc/origin/master/openshift-master.crt
etc/origin/master/openshift-master.key
etc/origin/master/openshift-master.kubeconfig
etc/origin/master/master.server.crt
etc/origin/master/master.server.key
etc/origin/master/master.kubelet-client.crt
etc/origin/master/master.kubelet-client.key
etc/origin/master/admin.crt
etc/origin/master/admin.key
etc/origin/master/admin.kubeconfig
etc/origin/master/etcd.server.crt
etc/origin/master/etcd.server.key
etc/origin/master/master.etcd-client.key
etc/origin/master/master.etcd-client.csr
etc/origin/master/master.etcd-client.crt
etc/origin/master/master.etcd-ca.crt
etc/origin/master/policy.json
etc/origin/master/scheduler.json
etc/origin/master/htpasswd
etc/origin/master/session-secrets.yaml
etc/origin/master/openshift-router.crt
etc/origin/master/openshift-router.key
etc/origin/master/registry.crt
etc/origin/master/registry.key
etc/origin/master/master-config.yaml
etc/origin/generated-configs/master-master-1.example.com/master.server.crt
...[OUTPUT OMITTED]...
etc/origin/cloudprovider/openstack.conf
etc/origin/node/system:node:master-0.example.com.crt
etc/origin/node/system:node:master-0.example.com.key
etc/origin/node/ca.crt
etc/origin/node/system:node:master-0.example.com.kubeconfig
etc/origin/node/server.crt
etc/origin/node/server.key
etc/origin/node/node-dnsmasq.conf
etc/origin/node/resolv.conf
etc/origin/node/node-config.yaml
etc/origin/node/flannel.etcd-client.key
etc/origin/node/flannel.etcd-client.csr
etc/origin/node/flannel.etcd-client.crt
etc/origin/node/flannel.etcd-ca.crt
etc/pki/ca-trust/source/anchors/openshift-ca.crt
etc/pki/ca-trust/source/anchors/registry-ca.crt
etc/dnsmasq.conf
etc/dnsmasq.d/origin-dns.conf
etc/dnsmasq.d/origin-upstream-dns.conf
etc/dnsmasq.d/node-dnsmasq.conf
packages.txt

Copy to Clipboard

Toggle word wrap

If needed, you can compress the files to save space:

MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR
sudo rm -Rf ${MYBACKUPDIR}

$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR
$ sudo rm -Rf ${MYBACKUPDIR}

Copy to Clipboard

Toggle word wrap

To create any of these files from scratch, the openshift-ansible-contrib repository contains the backup_master_node.sh script, which performs the previous steps. The script creates a directory on the host where you run the script and copies all the files previously mentioned.

Note

The openshift-ansible-contrib script is not supported by Red Hat, but the reference architecture team performs testing to ensure the code operates as defined and is secure.

You can run the script on every master host with:

mkdir ~/git
cd ~/git
git clone https://github.com/openshift/openshift-ansible-contrib.git
cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
./backup_master_node.sh -h

$ mkdir ~/git
$ cd ~/git
$ git clone https://github.com/openshift/openshift-ansible-contrib.git
$ cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
$ ./backup_master_node.sh -h

Copy to Clipboard

Toggle word wrap

5.2.1.2. Backing up etcd
Link kopieren

When you back up etcd, you must back up both the etcd configuration files and the etcd data.

5.2.1.2.1. Backing up etcd configuration files
Link kopieren

The etcd configuration files to be preserved are all stored in the /etc/etcd directory of the instances where etcd is running. This includes the etcd configuration file (/etc/etcd/etcd.conf) and the required certificates for cluster communication. All those files are generated at installation time by the Ansible installer.

Procedure

For each etcd member of the cluster, back up the etcd configuration.

ssh master-0
mkdir -p /backup/etcd-config-$(date +%Y%m%d)/
cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/

$ ssh master-0
# mkdir -p /backup/etcd-config-$(date +%Y%m%d)/
# cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/

Copy to Clipboard

Toggle word wrap

Note

The certificates and configuration files on each etcd cluster member are unique.

5.2.1.2.2. Backing up etcd data
Link kopieren

Prerequisites

Note

The OpenShift Container Platform installer creates aliases to avoid typing all the flags named etcdctl2 for etcd v2 tasks and etcdctl3 for etcd v3 tasks.

However, the etcdctl3 alias does not provide the full endpoint list to the etcdctl command, so the --endpoints option with all the endpoints must be provided.

Before backing up etcd:

etcdctl binaries should be available or, in containerized installations, the rhel7/etcd container should be available
Ensure connectivity with the etcd cluster (port 2379/tcp)
Ensure the proper certificates to connect to the etcd cluster

Procedure

Note

While the etcdctl backup command is used to perform the backup, etcd v3 has no concept of a backup. Instead, you either take a snapshot from a live member with the etcdctl snapshot save command or copy the member/snap/db file from an etcd data directory.

The etcdctl backup command rewrites some of the metadata contained in the backup, specifically, the node ID and cluster ID, which means that in the backup, the node loses its former identity. To recreate a cluster from the backup, you create a new, single-node cluster, then add the rest of the nodes to the cluster. The metadata is rewritten to prevent the new node from joining an existing cluster.

Back up the etcd data:

If you use the v2 API, take the following actions:

Stop all etcd services:
```
systemctl stop etcd.service
```
```
# systemctl stop etcd.service
```
Copy to Clipboard Toggle word wrap

Create the etcd data backup and copy the etcd db file:

mkdir -p /backup/etcd-$(date +%Y%m%d)
etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)

# mkdir -p /backup/etcd-$(date +%Y%m%d)
# etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
# cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)

Copy to Clipboard

Toggle word wrap

Start all etcd services:
```
systemctl start etcd.service
```
```
# systemctl start etcd.service
```
Copy to Clipboard Toggle word wrap

If you use the v3 API, run the following commands:

Important

Because clusters upgraded from previous versions of OpenShift Container Platform might contain v2 data stores, back up both v2 and v3 datastores.

Back up etcd v3 data:

systemctl show etcd --property=ActiveState,SubState
mkdir -p /backup/etcd-$(date +%Y%m%d)
etcdctl3 snapshot save */backup/etcd-$(date +%Y%m%d)*/db

# systemctl show etcd --property=ActiveState,SubState
# mkdir -p /backup/etcd-$(date +%Y%m%d)
# etcdctl3 snapshot save */backup/etcd-$(date +%Y%m%d)*/db
Snapshot saved at /backup/etcd-<date>/db

Copy to Clipboard

Toggle word wrap

Back up etcd v2 data:
```
systemctl stop etcd.service
etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)
systemctl start etcd.service
```
```
# systemctl stop etcd.service
# etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
# cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)
# systemctl start etcd.service
```
Copy to Clipboard Toggle word wrap
Note
The etcdctl snapshot save command requires the etcd service to be running.
In these commands, a /backup/etcd-<date>/ directory is created, where <date> represents the current date, which must be an external NFS share, S3 bucket, or any external storage location.
In the case of an all-in-one cluster, the etcd data directory is located in the /var/lib/origin/openshift.local.etcd directory.

5.2.1.3. Deprecating a master host
Link kopieren

Master hosts run important services, such as the OpenShift Container Platform API and controllers services. In order to deprecate a master host, these services must be stopped.

The OpenShift Container Platform API service is an active/active service, so stopping the service does not affect the environment as long as the requests are sent to a separate master server. However, the OpenShift Container Platform controllers service is an active/passive service, where the services leverage etcd to decide the active master.

Deprecating a master host in a multi-master architecture includes removing the master from the load balancer pool to avoid new connections attempting to use that master. This process depends heavily on the load balancer used. The steps below show the details of removing the master from haproxy. In the event that OpenShift Container Platform is running on a cloud provider, or using a F5 appliance, see the specific product documents to remove the master from rotation.

Procedure

Remove the backend section in the /etc/haproxy/haproxy.cfg configuration file. For example, if deprecating a master named master-0.example.com using haproxy, ensure the host name is removed from the following:

backend mgmt8443
    balance source
    mode tcp
    # MASTERS 8443
    server master-1.example.com 192.168.55.12:8443 check
    server master-2.example.com 192.168.55.13:8443 check

backend mgmt8443
    balance source
    mode tcp
    # MASTERS 8443
    server master-1.example.com 192.168.55.12:8443 check
    server master-2.example.com 192.168.55.13:8443 check

Copy to Clipboard

Toggle word wrap

Then, restart the haproxy service.
```
sudo systemctl restart haproxy
```
```
$ sudo systemctl restart haproxy
```
Copy to Clipboard Toggle word wrap

Once the master is removed from the load balancer, disable the API and controller services:

sudo systemctl disable --now atomic-openshift-master-api
sudo systemctl disable --now atomic-openshift-master-controllers

$ sudo systemctl disable --now atomic-openshift-master-api
$ sudo systemctl disable --now atomic-openshift-master-controllers

Copy to Clipboard

Toggle word wrap

Because the master host is a schedulable OpenShift Container Platform node, follow the steps in the Deprecating a node host section.
Remove the master host from the [masters] and [nodes] groups in the /etc/ansible/hosts Ansible inventory file to avoid issues if running any Ansible tasks using that inventory file.
Warning
Deprecating the first master host listed in the Ansible inventory file requires extra precautions.
The /etc/origin/master/ca.serial.txt file is generated on only the first master listed in the Ansible host inventory. If you deprecate the first master host, copy the /etc/origin/master/ca.serial.txt file to the rest of master hosts before the process.

The kubernetes service includes the master host IPs as endpoints. To verify that the master has been properly deprecated, review the kubernetes service output and see if the deprecated master has been removed:

oc describe svc kubernetes -n default

$ oc describe svc kubernetes -n default
Name:			kubernetes
Namespace:		default
Labels:			component=apiserver
			provider=kubernetes
Annotations:		<none>
Selector:		<none>
Type:			ClusterIP
IP:			10.111.0.1
Port:			https	443/TCP
Endpoints:		192.168.55.12:8443,192.168.55.13:8443
Port:			dns	53/UDP
Endpoints:		192.168.55.12:8053,192.168.55.13:8053
Port:			dns-tcp	53/TCP
Endpoints:		192.168.55.12:8053,192.168.55.13:8053
Session Affinity:	ClientIP
Events:			<none>

Copy to Clipboard

Toggle word wrap

After the master has been successfully deprecated, the host where the master was previously running can be safely deleted.

5.2.1.4. Removing an etcd host
Link kopieren

If an etcd host fails beyond restoration, remove it from the cluster.

Steps to be performed on all masters hosts

Procedure

Remove each other etcd host from the etcd cluster. Run the following command for each etcd node:

etcdctl -C https://<surviving host IP address>:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member remove <failed member ID>

# etcdctl -C https://<surviving host IP address>:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member remove <failed member ID>

Copy to Clipboard

Toggle word wrap

Restart the master API service on every master:
```
systemctl restart atomic-openshift-master-api
```
```
# systemctl restart atomic-openshift-master-api
```
Copy to Clipboard Toggle word wrap
Or, if using a single master cluster installation:
```
systemctl restart atomic-openshift-master
```
```
# systemctl restart atomic-openshift-master
```
Copy to Clipboard Toggle word wrap

Steps to be performed in the current etcd cluster

Procedure

Remove the failed host from the cluster:

etcdctl2 cluster-health
etcdctl2 member remove 8372784203e11288
etcdctl2 cluster-health

# etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
failed to check the health of member 8372784203e11288 on https://192.168.55.21:2379: Get https://192.168.55.21:2379/health: dial tcp 192.168.55.21:2379: getsockopt: connection refused
member 8372784203e11288 is unreachable: [https://192.168.55.21:2379] are all unreachable
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

# etcdctl2 member remove 8372784203e11288


Removed member 8372784203e11288 from cluster

# etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

Copy to Clipboard

Toggle word wrap

1: The remove command requires the etcd ID, not the hostname.

To ensure the etcd configuration does not use the failed host when the etcd service is restarted, modify the /etc/etcd/etcd.conf file on all remaining etcd hosts and remove the failed host in the value for the ETCD_INITIAL_CLUSTER variable:

vi /etc/etcd/etcd.conf

# vi /etc/etcd/etcd.conf

Copy to Clipboard

Toggle word wrap

For example:

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380,master-2.example.com=https://192.168.55.13:2380

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380,master-2.example.com=https://192.168.55.13:2380

Copy to Clipboard

Toggle word wrap

becomes:

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380

Copy to Clipboard

Toggle word wrap

Note

Restarting the etcd services is not required, because the failed host is removed using etcdctl.

Modify the Ansible inventory file to reflect the current status of the cluster and to avoid issues when re-running a playbook:

[OSEv3:children]
masters
nodes
etcd

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com

[OSEv3:children]
masters
nodes
etcd

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com

Copy to Clipboard

Toggle word wrap

If you are using Flannel, modify the flanneld service configuration located at /etc/sysconfig/flanneld on every host and remove the etcd host:

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379

Copy to Clipboard

Toggle word wrap

Restart the flanneld service:
```
systemctl restart flanneld.service
```
```
# systemctl restart flanneld.service
```
Copy to Clipboard Toggle word wrap

5.2.2. Creating a master host backup
Link kopieren

OpenShift Container Platform files

The master instances run important services, such as the API, controllers. The /etc/origin/master directory stores many important files:

The configuration, the API, controllers, services, and more
Certificates generated by the installation
All cloud provider-related configuration
Keys and other authentication files, such as htpasswd if you use htpasswd
And more

You can customize OpenShift Container Platform services, such as increasing the log level or using proxies. The configuration files are stored in the /etc/sysconfig directory.

Because the masters are also nodes, back up the entire /etc/origin directory.

Procedure

Important

You must perform the following steps on each master node.

Create a backup of the master host configuration files:
```
MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
sudo cp -aR /etc/sysconfig/atomic-* ${MYBACKUPDIR}/etc/sysconfig/
```
```
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
$ sudo cp -aR /etc/sysconfig/atomic-* ${MYBACKUPDIR}/etc/sysconfig/
```
Copy to Clipboard Toggle word wrap
Note
The configuration file is stored in the /etc/sysconfig/atomic-openshift-master-api, and /etc/sysconfig/atomic-openshift-master-controllers directories.
Warning
The /etc/origin/master/ca.serial.txt file is generated on only the first master listed in the Ansible host inventory. If you deprecate the first master host, copy the /etc/origin/master/ca.serial.txt file to the rest of master hosts before the process.

Other important files that need to be considered when planning a backup include:

Expand

File	Description
`/etc/cni/*`	Container Network Interface configuration (if used)
`/etc/sysconfig/iptables`	Where the `iptables` rules are stored
`/etc/sysconfig/docker-storage-setup`	The input file for `container-storage-setup` command
`/etc/sysconfig/docker`	The `docker` configuration file
`/etc/sysconfig/docker-network`	`docker` networking configuration (i.e. MTU)
`/etc/sysconfig/docker-storage`	`docker` storage configuration (generated by `container-storage-setup`)
`/etc/dnsmasq.conf`	Main configuration file for `dnsmasq`
`/etc/dnsmasq.d/*`	Different `dnsmasq` configuration files
`/etc/sysconfig/flanneld`	`flannel` configuration file (if used)
`/etc/pki/ca-trust/source/anchors/`	Certificates added to the system (i.e. for external registries)

Create a backup of those files:

MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
sudo mkdir -p ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors
sudo cp -aR /etc/sysconfig/{iptables,docker-*,flanneld} \
    ${MYBACKUPDIR}/etc/sysconfig/
sudo cp -aR /etc/dnsmasq* /etc/cni ${MYBACKUPDIR}/etc/
sudo cp -aR /etc/pki/ca-trust/source/anchors/* \
    ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/

$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo mkdir -p ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors
$ sudo cp -aR /etc/sysconfig/{iptables,docker-*,flanneld} \
    ${MYBACKUPDIR}/etc/sysconfig/
$ sudo cp -aR /etc/dnsmasq* /etc/cni ${MYBACKUPDIR}/etc/
$ sudo cp -aR /etc/pki/ca-trust/source/anchors/* \
    ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/

Copy to Clipboard

Toggle word wrap

If a package is accidentally removed or you need to resore a file that is included in an rpm package, having a list of rhel packages installed on the system can be useful.
Note
If you use Red Hat Satellite features, such as content views or the facts store, provide a proper mechanism to reinstall the missing packages and a historical data of packages installed in the systems.
To create a list of the current rhel packages installed in the system:
```
MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo mkdir -p ${MYBACKUPDIR}
rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt
```
```
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}
$ rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt
```
Copy to Clipboard Toggle word wrap

If you used the previous steps, the following files are present in the backup directory:

MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo find ${MYBACKUPDIR} -mindepth 1 -type f -printf '%P\n'

$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo find ${MYBACKUPDIR} -mindepth 1 -type f -printf '%P\n'
etc/sysconfig/atomic-openshift-master
etc/sysconfig/atomic-openshift-master-api
etc/sysconfig/atomic-openshift-master-controllers
etc/sysconfig/atomic-openshift-node
etc/sysconfig/flanneld
etc/sysconfig/iptables
etc/sysconfig/docker-network
etc/sysconfig/docker-storage
etc/sysconfig/docker-storage-setup
etc/sysconfig/docker-storage-setup.rpmnew
etc/origin/master/ca.crt
etc/origin/master/ca.key
etc/origin/master/ca.serial.txt
etc/origin/master/ca-bundle.crt
etc/origin/master/master.proxy-client.crt
etc/origin/master/master.proxy-client.key
etc/origin/master/service-signer.crt
etc/origin/master/service-signer.key
etc/origin/master/serviceaccounts.private.key
etc/origin/master/serviceaccounts.public.key
etc/origin/master/openshift-master.crt
etc/origin/master/openshift-master.key
etc/origin/master/openshift-master.kubeconfig
etc/origin/master/master.server.crt
etc/origin/master/master.server.key
etc/origin/master/master.kubelet-client.crt
etc/origin/master/master.kubelet-client.key
etc/origin/master/admin.crt
etc/origin/master/admin.key
etc/origin/master/admin.kubeconfig
etc/origin/master/etcd.server.crt
etc/origin/master/etcd.server.key
etc/origin/master/master.etcd-client.key
etc/origin/master/master.etcd-client.csr
etc/origin/master/master.etcd-client.crt
etc/origin/master/master.etcd-ca.crt
etc/origin/master/policy.json
etc/origin/master/scheduler.json
etc/origin/master/htpasswd
etc/origin/master/session-secrets.yaml
etc/origin/master/openshift-router.crt
etc/origin/master/openshift-router.key
etc/origin/master/registry.crt
etc/origin/master/registry.key
etc/origin/master/master-config.yaml
etc/origin/generated-configs/master-master-1.example.com/master.server.crt
...[OUTPUT OMITTED]...
etc/origin/cloudprovider/openstack.conf
etc/origin/node/system:node:master-0.example.com.crt
etc/origin/node/system:node:master-0.example.com.key
etc/origin/node/ca.crt
etc/origin/node/system:node:master-0.example.com.kubeconfig
etc/origin/node/server.crt
etc/origin/node/server.key
etc/origin/node/node-dnsmasq.conf
etc/origin/node/resolv.conf
etc/origin/node/node-config.yaml
etc/origin/node/flannel.etcd-client.key
etc/origin/node/flannel.etcd-client.csr
etc/origin/node/flannel.etcd-client.crt
etc/origin/node/flannel.etcd-ca.crt
etc/pki/ca-trust/source/anchors/openshift-ca.crt
etc/pki/ca-trust/source/anchors/registry-ca.crt
etc/dnsmasq.conf
etc/dnsmasq.d/origin-dns.conf
etc/dnsmasq.d/origin-upstream-dns.conf
etc/dnsmasq.d/node-dnsmasq.conf
packages.txt

Copy to Clipboard

Toggle word wrap

If needed, you can compress the files to save space:

MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR
sudo rm -Rf ${MYBACKUPDIR}

$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR
$ sudo rm -Rf ${MYBACKUPDIR}

Copy to Clipboard

Toggle word wrap

Note

The openshift-ansible-contrib script is not supported by Red Hat, but the reference architecture team performs testing to ensure the code operates as defined and is secure.

You can run the script on every master host with:

mkdir ~/git
cd ~/git
git clone https://github.com/openshift/openshift-ansible-contrib.git
cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
./backup_master_node.sh -h

$ mkdir ~/git
$ cd ~/git
$ git clone https://github.com/openshift/openshift-ansible-contrib.git
$ cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
$ ./backup_master_node.sh -h

Copy to Clipboard

Toggle word wrap

5.2.3. Restoring a master host backup
Link kopieren

After creating a backup of important master host files, if they become corrupted or accidentally removed, you can restore the files by copying the files back to master, ensuring they contain the proper content, and restarting the affected services.

Procedure

Restore the /etc/origin/master/master-config.yaml file:

MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
cp /etc/origin/master/master-config.yaml /etc/origin/master/master-config.yaml.old
cp /backup/$(hostname)/$(date +%Y%m%d)/origin/master/master-config.yaml /etc/origin/master/master-config.yaml
systemctl restart atomic-openshift-master-api
systemctl restart atomic-openshift-master-controllers

# MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
# cp /etc/origin/master/master-config.yaml /etc/origin/master/master-config.yaml.old
# cp /backup/$(hostname)/$(date +%Y%m%d)/origin/master/master-config.yaml /etc/origin/master/master-config.yaml
# systemctl restart atomic-openshift-master-api
# systemctl restart atomic-openshift-master-controllers

Copy to Clipboard

Toggle word wrap

Warning

Restarting the master services can lead to downtime. However, you can remove the master host from the highly available load balancer pool, then perform the restore operation. Once the service has been properly restored, you can add the master host back to the load balancer pool.

Note

Perform a full reboot of the affected instance to restore the iptables configuration.

If you cannot restart OpenShift Container Platform because packages are missing, reinstall the packages.
1. Get the list of the current installed packages:
  $ rpm -qa | sort > /tmp/current_packages.txt
  Copy to Clipboard Toggle word wrap
2. View the differences between the package lists:
  $ diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt > ansible-2.4.0.0-5.el7.noarch
  Copy to Clipboard Toggle word wrap
3. Reinstall the missing packages:
  # yum reinstall -y <packages>
  1
  Copy to Clipboard Toggle word wrap
  1
  Replace <packages> with the packages that are different between the package lists.

Restore a system certificate by copying the certificate to the /etc/pki/ca-trust/source/anchors/ directory and execute the update-ca-trust:

MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
sudo cp ${MYBACKUPDIR}/external_certificates/my_company.crt /etc/pki/ca-trust/source/anchors/
sudo update-ca-trust

$ MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
$ sudo cp ${MYBACKUPDIR}/external_certificates/my_company.crt /etc/pki/ca-trust/source/anchors/
$ sudo update-ca-trust

Copy to Clipboard

Toggle word wrap

Note

Always ensure the user ID and group ID are restored when the files are copied back, as well as the SELinux context.

5.3. Node host tasks
Link kopieren

5.3.1. Deprecating a node host
Link kopieren

The procedure is the same whether deprecating an infrastructure node or an application node.

Prerequisites

Ensure enough capacity is available to migrate the existing pods from the node set to be removed. Removing an infrastructure node is advised only when at least two more nodes will stay online after the infrastructure node is removed.

Procedure

List all available nodes to find the node to deprecate:

oc get nodes

$ oc get nodes
NAME                  STATUS                     AGE       VERSION
ocp-infra-node-b7pl   Ready                      23h       v1.6.1+5115d708d7
ocp-infra-node-p5zj   Ready                      23h       v1.6.1+5115d708d7
ocp-infra-node-rghb   Ready                      23h       v1.6.1+5115d708d7
ocp-master-dgf8       Ready,SchedulingDisabled   23h       v1.6.1+5115d708d7
ocp-master-q1v2       Ready,SchedulingDisabled   23h       v1.6.1+5115d708d7
ocp-master-vq70       Ready,SchedulingDisabled   23h       v1.6.1+5115d708d7
ocp-node-020m         Ready                      23h       v1.6.1+5115d708d7
ocp-node-7t5p         Ready                      23h       v1.6.1+5115d708d7
ocp-node-n0dd         Ready                      23h       v1.6.1+5115d708d7

Copy to Clipboard

Toggle word wrap

As an example, this topic deprecates the ocp-infra-node-b7pl infrastructure node.

Describe the node and its running services:

oc describe node ocp-infra-node-b7pl

$ oc describe node ocp-infra-node-b7pl
Name:			ocp-infra-node-b7pl
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=n1-standard-2
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=europe-west3
			failure-domain.beta.kubernetes.io/zone=europe-west3-c
			kubernetes.io/hostname=ocp-infra-node-b7pl
			role=infra
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Wed, 22 Nov 2017 09:36:36 -0500
Phase:
Conditions:
  ...
Addresses:		10.156.0.11,ocp-infra-node-b7pl
Capacity:
 cpu:		2
 memory:	7494480Ki
 pods:		20
Allocatable:
 cpu:		2
 memory:	7392080Ki
 pods:		20
System Info:
 Machine ID:			bc95ccf67d047f2ae42c67862c202e44
 System UUID:			9762CC3D-E23C-AB13-B8C5-FA16F0BCCE4C
 Boot ID:			ca8bf088-905d-4ec0-beec-8f89f4527ce4
 Kernel Version:		3.10.0-693.5.2.el7.x86_64
 OS Image:			Employee SKU
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.6.1+5115d708d7
 Kube-Proxy Version:		v1.6.1+5115d708d7
ExternalID:			437740049672994824
Non-terminated Pods:		(2 in total)
  Namespace			Name				CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----				------------	----------	---------------	-------------
  default			docker-registry-1-5szjs		100m (5%)	0 (0%)		256Mi (3%)0 (0%)
  default			router-1-vzlzq			100m (5%)	0 (0%)		256Mi (3%)0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  200m (10%)	0 (0%)		512Mi (7%)	0 (0%)
Events:		<none>

Copy to Clipboard

Toggle word wrap

The output above shows that the node is running two pods: router-1-vzlzq and docker-registry-1-5szjs. Two more infrastructure nodes are available to migrate these two pods.

Note

The cluster described above is a highly available cluster, this means both the router and docker-registry services are running on all infrastructure nodes.

Mark a node as unschedulable and evacuate all of its pods:

oc adm drain ocp-infra-node-b7pl --delete-local-data

$ oc adm drain ocp-infra-node-b7pl --delete-local-data
node "ocp-infra-node-b7pl" cordoned
WARNING: Deleting pods with local storage: docker-registry-1-5szjs
pod "docker-registry-1-5szjs" evicted
pod "router-1-vzlzq" evicted
node "ocp-infra-node-b7pl" drained

Copy to Clipboard

Toggle word wrap

If the pod has attached local storage (for example, EmptyDir), the --delete-local-data option must be provided. Generally, pods running in production should use the local storage only for temporary or cache files, but not for anything important or persistent. For regular storage, applications should use object storage or persistent volumes. In this case, the docker-registry pod’s local storage is empty, because the object storage is used instead to store the container images.

Note

The above operation deletes existing pods running on the node. Then, new pods are created according to the replication controller.

In general, every application should be deployed with a deployment configuration, which creates pods using the replication controller.

oc adm drain will not delete any bare pods (pods that are neither mirror pods nor managed by ReplicationController, ReplicaSet, DaemonSet, StatefulSet, or a job). To do so, the --force option is required. Be aware that the bare pods will not be recreated on other nodes and data may be lost during this operation.

The example below shows the output of the replication controller of the registry:

oc describe rc/docker-registry-1

$ oc describe rc/docker-registry-1
Name:		docker-registry-1
Namespace:	default
Selector:	deployment=docker-registry-1,deploymentconfig=docker-registry,docker-registry=default
Labels:		docker-registry=default
		openshift.io/deployment-config.name=docker-registry
Annotations: ...
Replicas:	3 current / 3 desired
Pods Status:	3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:		deployment=docker-registry-1
			deploymentconfig=docker-registry
			docker-registry=default
  Annotations:		openshift.io/deployment-config.latest-version=1
			openshift.io/deployment-config.name=docker-registry
			openshift.io/deployment.name=docker-registry-1
  Service Account:	registry
  Containers:
   registry:
    Image:	openshift3/ose-docker-registry:v3.6.173.0.49
    Port:	5000/TCP
    Requests:
      cpu:	100m
      memory:	256Mi
    Liveness:	http-get https://:5000/healthz delay=10s timeout=5s period=10s #success=1 #failure=3
    Readiness:	http-get https://:5000/healthz delay=0s timeout=5s period=10s #success=1 #failure=3
    Environment:
      REGISTRY_HTTP_ADDR:					:5000
      REGISTRY_HTTP_NET:					tcp
      REGISTRY_HTTP_SECRET:					tyGEnDZmc8dQfioP3WkNd5z+Xbdfy/JVXf/NLo3s/zE=
      REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA:	false
      REGISTRY_HTTP_TLS_KEY:					/etc/secrets/registry.key
      OPENSHIFT_DEFAULT_REGISTRY:				docker-registry.default.svc:5000
      REGISTRY_CONFIGURATION_PATH:				/etc/registry/config.yml
      REGISTRY_HTTP_TLS_CERTIFICATE:				/etc/secrets/registry.crt
    Mounts:
      /etc/registry from docker-config (rw)
      /etc/secrets from registry-certificates (rw)
      /registry from registry-storage (rw)
  Volumes:
   registry-storage:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
   registry-certificates:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	registry-certificates
    Optional:	false
   docker-config:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	registry-config
    Optional:	false
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  49m		49m		1	replication-controller			Normal		SuccessfulCreate	Created pod: docker-registry-1-dprp5

Copy to Clipboard

Toggle word wrap

The event at the bottom of the output displays information about new pod creation. So, when listing all pods:

oc get pods

$ oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-dprp5    1/1       Running   0          52m
docker-registry-1-kr8jq    1/1       Running   0          1d
docker-registry-1-ncpl2    1/1       Running   0          1d
registry-console-1-g4nqg   1/1       Running   0          1d
router-1-2gshr             0/1       Pending   0          52m
router-1-85qm4             1/1       Running   0          1d
router-1-q5sr8             1/1       Running   0          1d

Copy to Clipboard

Toggle word wrap

The docker-registry-1-5szjs and router-1-vzlzq pods that were running on the now deprecated node are no longer available. Instead, two new pods have been created: docker-registry-1-dprp5 and router-1-2gshr. As shown above, the new router pod is router-1-2gshr, but is in the Pending state. This is because every node can be running only on one single router and is bound to the ports 80 and 443 of the host.
When observing the newly created registry pod, the example below shows that the pod has been created on the ocp-infra-node-rghb node, which is different from the deprecating node:
```
oc describe pod docker-registry-1-dprp5
```
```
$ oc describe pod docker-registry-1-dprp5
Name:			docker-registry-1-dprp5
Namespace:		default
Security Policy:	hostnetwork
Node:			ocp-infra-node-rghb/10.156.0.10
...
```
Copy to Clipboard Toggle word wrap
The only difference between deprecating the infrastructure and the application node is that once the infrastructure node is evacuated, and if there is no plan to replace that node, the services running on infrastructure nodes can be scaled down:
```
oc scale dc/router --replicas 2
oc scale dc/docker-registry --replicas 2
```
```
$ oc scale dc/router --replicas 2
deploymentconfig "router" scaled

$ oc scale dc/docker-registry --replicas 2
deploymentconfig "docker-registry" scaled
```
Copy to Clipboard Toggle word wrap

Now, every infrastructure node is running only one kind of each pod:

oc get pods
oc describe po/docker-registry-1-kr8jq | grep Node:
oc describe po/docker-registry-1-ncpl2 | grep Node:

$ oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-kr8jq    1/1       Running   0          1d
docker-registry-1-ncpl2    1/1       Running   0          1d
registry-console-1-g4nqg   1/1       Running   0          1d
router-1-85qm4             1/1       Running   0          1d
router-1-q5sr8             1/1       Running   0          1d

$ oc describe po/docker-registry-1-kr8jq | grep Node:
Node:			ocp-infra-node-p5zj/10.156.0.9

$ oc describe po/docker-registry-1-ncpl2 | grep Node:
Node:			ocp-infra-node-rghb/10.156.0.10

Copy to Clipboard

Toggle word wrap

Note

To provide a full highly available cluster, at least three infrastructure nodes should always be available.

To verify that the scheduling on the node is disabled:

oc get nodes

$ oc get nodes
NAME                  STATUS                     AGE       VERSION
ocp-infra-node-b7pl   Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
ocp-infra-node-p5zj   Ready                      1d        v1.6.1+5115d708d7
ocp-infra-node-rghb   Ready                      1d        v1.6.1+5115d708d7
ocp-master-dgf8       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
ocp-master-q1v2       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
ocp-master-vq70       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
ocp-node-020m         Ready                      1d        v1.6.1+5115d708d7
ocp-node-7t5p         Ready                      1d        v1.6.1+5115d708d7
ocp-node-n0dd         Ready                      1d        v1.6.1+5115d708d7

Copy to Clipboard

Toggle word wrap

And that the node does not contain any pods:

oc describe node ocp-infra-node-b7pl

$ oc describe node ocp-infra-node-b7pl
Name:			ocp-infra-node-b7pl
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=n1-standard-2
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=europe-west3
			failure-domain.beta.kubernetes.io/zone=europe-west3-c
			kubernetes.io/hostname=ocp-infra-node-b7pl
			role=infra
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Wed, 22 Nov 2017 09:36:36 -0500
Phase:
Conditions:
  ...
Addresses:		10.156.0.11,ocp-infra-node-b7pl
Capacity:
 cpu:		2
 memory:	7494480Ki
 pods:		20
Allocatable:
 cpu:		2
 memory:	7392080Ki
 pods:		20
System Info:
 Machine ID:			bc95ccf67d047f2ae42c67862c202e44
 System UUID:			9762CC3D-E23C-AB13-B8C5-FA16F0BCCE4C
 Boot ID:			ca8bf088-905d-4ec0-beec-8f89f4527ce4
 Kernel Version:		3.10.0-693.5.2.el7.x86_64
 OS Image:			Employee SKU
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.6.1+5115d708d7
 Kube-Proxy Version:		v1.6.1+5115d708d7
ExternalID:			437740049672994824
Non-terminated Pods:		(0 in total)
  Namespace			Name		CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----		------------	----------	---------------	-------------
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  0 (0%)	0 (0%)		0 (0%)		0 (0%)
Events:		<none>

Copy to Clipboard

Toggle word wrap

Remove the infrastructure instance from the backend section in the /etc/haproxy/haproxy.cfg configuration file:

backend router80
    balance source
    mode tcp
    server infra-1.example.com 192.168.55.12:80 check
    server infra-2.example.com 192.168.55.13:80 check

backend router443
    balance source
    mode tcp
    server infra-1.example.com 192.168.55.12:443 check
    server infra-2.example.com 192.168.55.13:443 check

backend router80
    balance source
    mode tcp
    server infra-1.example.com 192.168.55.12:80 check
    server infra-2.example.com 192.168.55.13:80 check

backend router443
    balance source
    mode tcp
    server infra-1.example.com 192.168.55.12:443 check
    server infra-2.example.com 192.168.55.13:443 check

Copy to Clipboard

Toggle word wrap

Then, restart the haproxy service.
```
sudo systemctl restart haproxy
```
```
$ sudo systemctl restart haproxy
```
Copy to Clipboard Toggle word wrap

Remove the node from the cluster after all pods are evicted with command:

oc delete node ocp-infra-node-b7pl

$ oc delete node ocp-infra-node-b7pl
node "ocp-infra-node-b7pl" deleted

Copy to Clipboard

Toggle word wrap

oc get nodes

$ oc get nodes
NAME                  STATUS                     AGE       VERSION
ocp-infra-node-p5zj   Ready                      1d        v1.6.1+5115d708d7
ocp-infra-node-rghb   Ready                      1d        v1.6.1+5115d708d7
ocp-master-dgf8       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
ocp-master-q1v2       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
ocp-master-vq70       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
ocp-node-020m         Ready                      1d        v1.6.1+5115d708d7
ocp-node-7t5p         Ready                      1d        v1.6.1+5115d708d7
ocp-node-n0dd         Ready                      1d        v1.6.1+5115d708d7

Copy to Clipboard

Toggle word wrap

Note

For more information on evacuating and draining pods or nodes, see Node maintenance section.

5.3.1.1. Replacing a node host
Link kopieren

In the event that a node would need to be added in place of the deprecated node, follow the Adding hosts to an existing cluster section.

5.3.2. Creating a node host backup
Link kopieren

Creating a backup of a node host is a different use case from backing up a master host. Because master hosts contain many important files, creating a backup is highly recommended. However, the nature of nodes is that anything special is replicated over the nodes in case of failover, and they typically do not contain data that is necessary to run an environment. If a backup of a node contains something necessary to run an environment, then a creating a backup is recommended.

The backup process is to be performed before any change to the infrastructure, such as a system update, upgrade, or any other significant modification. Backups should be performed on a regular basis to ensure the most recent data is available if a failure occurs.

OpenShift Container Platform files

Node instances run applications in the form of pods, which are based on containers. The /etc/origin/ and /etc/origin/node directories house important files, such as:

The configuration of the node services
Certificates generated by the installation
Cloud provider-related configuration
Keys and other authentication files, such as the dnsmasq configuration

The OpenShift Container Platform services can be customized to increase the log level, use proxies, and more, and the configuration files are stored in the /etc/sysconfig directory.

Procedure

Create a backup of the node configuration files:

MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
sudo cp -aR /etc/sysconfig/atomic-openshift-node ${MYBACKUPDIR}/etc/sysconfig/

$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
$ sudo cp -aR /etc/sysconfig/atomic-openshift-node ${MYBACKUPDIR}/etc/sysconfig/

Copy to Clipboard

Toggle word wrap

OpenShift Container Platform uses specific files that must be taken into account when planning the backup policy, including:

Expand

File	Description
`/etc/cni/*`	Container Network Interface configuration (if used)
`/etc/sysconfig/iptables`	Where the `iptables` rules are stored
`/etc/sysconfig/docker-storage-setup`	The input file for `container-storage-setup` command
`/etc/sysconfig/docker`	The `docker` configuration file
`/etc/sysconfig/docker-network`	`docker` networking configuration (i.e. MTU)
`/etc/sysconfig/docker-storage`	`docker` storage configuration (generated by `container-storage-setup`)
`/etc/dnsmasq.conf`	Main configuration file for `dnsmasq`
`/etc/dnsmasq.d/*`	Different `dnsmasq` configuration files
`/etc/sysconfig/flanneld`	`flannel` configuration file (if used)
`/etc/pki/ca-trust/source/anchors/`	Certificates added to the system (i.e. for external registries)

To create those files:

MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
sudo mkdir -p ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors
sudo cp -aR /etc/sysconfig/{iptables,docker-*,flanneld} \
    ${MYBACKUPDIR}/etc/sysconfig/
sudo cp -aR /etc/dnsmasq* /etc/cni ${MYBACKUPDIR}/etc/
sudo cp -aR /etc/pki/ca-trust/source/anchors/* \
    ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/

$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo mkdir -p ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors
$ sudo cp -aR /etc/sysconfig/{iptables,docker-*,flanneld} \
    ${MYBACKUPDIR}/etc/sysconfig/
$ sudo cp -aR /etc/dnsmasq* /etc/cni ${MYBACKUPDIR}/etc/
$ sudo cp -aR /etc/pki/ca-trust/source/anchors/* \
    ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/

Copy to Clipboard

Toggle word wrap

If a package is accidentally removed, or a file included in an rpm package should be restored, having a list of rhel packages installed on the system can be useful.
Note
If using Red Hat Satellite features, such as content views or the facts store, provide a proper mechanism to reinstall the missing packages and a historical data of packages installed in the systems.
To create a list of the current rhel packages installed in the system:
```
MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo mkdir -p ${MYBACKUPDIR}
rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt
```
```
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}
$ rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt
```
Copy to Clipboard Toggle word wrap

The following files should now be present in the backup directory:

MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo find ${MYBACKUPDIR} -mindepth 1 -type f -printf '%P\n'

$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo find ${MYBACKUPDIR} -mindepth 1 -type f -printf '%P\n'
etc/sysconfig/atomic-openshift-node
etc/sysconfig/flanneld
etc/sysconfig/iptables
etc/sysconfig/docker-network
etc/sysconfig/docker-storage
etc/sysconfig/docker-storage-setup
etc/sysconfig/docker-storage-setup.rpmnew
etc/origin/node/system:node:app-node-0.example.com.crt
etc/origin/node/system:node:app-node-0.example.com.key
etc/origin/node/ca.crt
etc/origin/node/system:node:app-node-0.example.com.kubeconfig
etc/origin/node/server.crt
etc/origin/node/server.key
etc/origin/node/node-dnsmasq.conf
etc/origin/node/resolv.conf
etc/origin/node/node-config.yaml
etc/origin/node/flannel.etcd-client.key
etc/origin/node/flannel.etcd-client.csr
etc/origin/node/flannel.etcd-client.crt
etc/origin/node/flannel.etcd-ca.crt
etc/origin/cloudprovider/openstack.conf
etc/pki/ca-trust/source/anchors/openshift-ca.crt
etc/pki/ca-trust/source/anchors/registry-ca.crt
etc/dnsmasq.conf
etc/dnsmasq.d/origin-dns.conf
etc/dnsmasq.d/origin-upstream-dns.conf
etc/dnsmasq.d/node-dnsmasq.conf
packages.txt

Copy to Clipboard

Toggle word wrap

If needed, the files can be compressed to save space:

MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR
sudo rm -Rf ${MYBACKUPDIR}

$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR
$ sudo rm -Rf ${MYBACKUPDIR}

Copy to Clipboard

Toggle word wrap

To create any of these files from scratch, the openshift-ansible-contrib repository contains the backup_master_node.sh script, which performs the previous steps. The script creates a directory on the host running the script and copies all the files previously mentioned.

Note

The openshift-ansible-contrib script is not supported by Red Hat, but the reference architecture team performs testing to ensure the code operates as defined and is secure.

The script can be executed on every master host with:

mkdir ~/git
cd ~/git
git clone https://github.com/openshift/openshift-ansible-contrib.git
cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
./backup_master_node.sh -h

$ mkdir ~/git
$ cd ~/git
$ git clone https://github.com/openshift/openshift-ansible-contrib.git
$ cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
$ ./backup_master_node.sh -h

Copy to Clipboard

Toggle word wrap

5.3.3. Restoring a node host backup
Link kopieren

After creating a backup of important node host files, if they become corrupted or accidentally removed, you can restore the file by copying back the file, ensuring it contains the proper content and restart the affected services.

Procedure

Restore the /etc/origin/node/node-config.yaml file:

MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
cp /etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml.old
cp /backup/$(hostname)/$(date +%Y%m%d)/etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml
systemctl restart atomic-openshift-node

# MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
# cp /etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml.old
# cp /backup/$(hostname)/$(date +%Y%m%d)/etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml
# systemctl restart atomic-openshift-node

Copy to Clipboard

Toggle word wrap

Warning

Restarting the services can lead to downtime. See Node maintenance, for tips on how to ease the process.

Note

Perform a full reboot of the affected instance to restore the iptables configuration.

If you cannot restart OpenShift Container Platform because packages are missing, reinstall the packages.
1. Get the list of the current installed packages:
  $ rpm -qa | sort > /tmp/current_packages.txt
  Copy to Clipboard Toggle word wrap
2. View the differences between the package lists:
  $ diff /tmp/current_packages.txt ${MYBACKUPDIR}/packages.txt > ansible-2.4.0.0-5.el7.noarch
  Copy to Clipboard Toggle word wrap
3. Reinstall the missing packages:
  # yum reinstall -y <packages>
  1
  Copy to Clipboard Toggle word wrap
  1
  Replace <packages> with the packages that are different between the package lists.

Restore a system certificate by copying the certificate to the /etc/pki/ca-trust/source/anchors/ directory and execute the update-ca-trust:

MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
sudo cp ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/my_company.crt /etc/pki/ca-trust/source/anchors/
sudo update-ca-trust

$ MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
$ sudo cp ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/my_company.crt /etc/pki/ca-trust/source/anchors/
$ sudo update-ca-trust

Copy to Clipboard

Toggle word wrap

Note

Always ensure proper user ID and group ID are restored when the files are copied back, as well as the SELinux context.

5.3.4. Node maintenance and next steps
Link kopieren

See Managing nodes or Managing pods topics for various node management options. These include:

A node can reserve a portion of its resources to be used by specific components. These include the kubelet, kube-proxy, Docker, or other remaining system components such as sshd and NetworkManager. See the Allocating node resources section in the Cluster Administrator guide for more information.

5.4. etcd tasks
Link kopieren

5.4.1. etcd backup
Link kopieren

etcd is the key value store for all object definitions, as well as the persistent master state. Other components watch for changes, then bring themselves into the desired state.

OpenShift Container Platform versions prior to 3.5 use etcd version 2 (v2), while 3.5 and later use version 3 (v3). The data model between the two versions of etcd is different. etcd v3 can use both the v2 and v3 data models, whereas etcd v2 can only use the v2 data model. In an etcd v3 server, the v2 and v3 data stores exist in parallel and are independent.

For both v2 and v3 operations, you can use the ETCDCTL_API environment variable to use the proper API:

etcdctl -v
ETCDCTL_API=3 etcdctl version

$ etcdctl -v
etcdctl version: 3.2.5
API version: 2
$ ETCDCTL_API=3 etcdctl version
etcdctl version: 3.2.5
API version: 3.2

Copy to Clipboard

Toggle word wrap

See Migrating etcd Data (v2 to v3) section in the OpenShift Container Platform 3.7 documentation for information about how to migrate to v3.

The etcd backup process is composed of two different procedures:

Configuration backup: Including the required etcd configuration and certificates
Data backup: Including both v2 and v3 data model.

You can perform the data backup process on any host that has connectivity to the etcd cluster, where the proper certificates are provided, and where the etcdctl tool is installed.

Note

The backup files must be copied to an external system, ideally outside the OpenShift Container Platform environment, and then encrypted.

Note that the etcd backup still has all the references to current storage volumes. When you restore etcd, OpenShift Container Platform starts launching the previous pods on nodes and reattaching the same storage. This process is no different than the process of when you remove a node from the cluster and add a new one back in its place. Anything attached to that node is reattached to the pods on whatever nodes they are rescheduled to.

5.4.1.1. Backing up etcd
Link kopieren

When you back up etcd, you must back up both the etcd configuration files and the etcd data.

5.4.1.1.1. Backing up etcd configuration files
Link kopieren

Procedure

For each etcd member of the cluster, back up the etcd configuration.

ssh master-0
mkdir -p /backup/etcd-config-$(date +%Y%m%d)/
cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/

$ ssh master-0
# mkdir -p /backup/etcd-config-$(date +%Y%m%d)/
# cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/

Copy to Clipboard

Toggle word wrap

Note

The certificates and configuration files on each etcd cluster member are unique.

5.4.1.1.2. Backing up etcd data
Link kopieren

Prerequisites

Note

The OpenShift Container Platform installer creates aliases to avoid typing all the flags named etcdctl2 for etcd v2 tasks and etcdctl3 for etcd v3 tasks.

However, the etcdctl3 alias does not provide the full endpoint list to the etcdctl command, so the --endpoints option with all the endpoints must be provided.

Before backing up etcd:

etcdctl binaries should be available or, in containerized installations, the rhel7/etcd container should be available
Ensure connectivity with the etcd cluster (port 2379/tcp)
Ensure the proper certificates to connect to the etcd cluster

Procedure

Note

Back up the etcd data:

If you use the v2 API, take the following actions:

Stop all etcd services:
```
systemctl stop etcd.service
```
```
# systemctl stop etcd.service
```
Copy to Clipboard Toggle word wrap

Create the etcd data backup and copy the etcd db file:

mkdir -p /backup/etcd-$(date +%Y%m%d)
etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)

# mkdir -p /backup/etcd-$(date +%Y%m%d)
# etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
# cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)

Copy to Clipboard

Toggle word wrap

Start all etcd services:
```
systemctl start etcd.service
```
```
# systemctl start etcd.service
```
Copy to Clipboard Toggle word wrap

If you use the v3 API, run the following commands:

Important

Because clusters upgraded from previous versions of OpenShift Container Platform might contain v2 data stores, back up both v2 and v3 datastores.

Back up etcd v3 data:

systemctl show etcd --property=ActiveState,SubState
mkdir -p /backup/etcd-$(date +%Y%m%d)
etcdctl3 snapshot save */backup/etcd-$(date +%Y%m%d)*/db

# systemctl show etcd --property=ActiveState,SubState
# mkdir -p /backup/etcd-$(date +%Y%m%d)
# etcdctl3 snapshot save */backup/etcd-$(date +%Y%m%d)*/db
Snapshot saved at /backup/etcd-<date>/db

Copy to Clipboard

Toggle word wrap

Back up etcd v2 data:
```
systemctl stop etcd.service
etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)
systemctl start etcd.service
```
```
# systemctl stop etcd.service
# etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
# cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)
# systemctl start etcd.service
```
Copy to Clipboard Toggle word wrap
Note
The etcdctl snapshot save command requires the etcd service to be running.
In these commands, a /backup/etcd-<date>/ directory is created, where <date> represents the current date, which must be an external NFS share, S3 bucket, or any external storage location.
In the case of an all-in-one cluster, the etcd data directory is located in the /var/lib/origin/openshift.local.etcd directory.

5.4.2. Restoring etcd
Link kopieren

The restore procedure for etcd configuration files replaces the appropriate files, then restarts the service.

If an etcd host has become corrupted and the /etc/etcd/etcd.conf file is lost, restore it using:

ssh master-0
cp /backup/yesterday/master-0-files/etcd.conf /etc/etcd/etcd.conf
restorecon -Rv /etc/etcd/etcd.conf
systemctl restart etcd.service

$ ssh master-0
# cp /backup/yesterday/master-0-files/etcd.conf /etc/etcd/etcd.conf
# restorecon -Rv /etc/etcd/etcd.conf
# systemctl restart etcd.service

Copy to Clipboard

Toggle word wrap

In this example, the backup file is stored in the /backup/yesterday/master-0-files/etcd.conf path where it can be used as an external NFS share, S3 bucket, or other storage solution.

5.4.2.1. Restoring etcd v2 & v3 data
Link kopieren

The following process restores healthy data files and starts the etcd cluster as a single node, then adds the rest of the nodes if an etcd cluster is required.

Procedure

Stop all etcd services:
```
systemctl stop etcd.service
```
```
# systemctl stop etcd.service
```
Copy to Clipboard Toggle word wrap
To ensure the proper backup is restored, delete the etcd directories:
- To back up the current etcd data before you delete the directory, run the following command:
  # mv /var/lib/etcd /var/lib/etcd.old # mkdir /var/lib/etcd # chown -R etcd.etcd /var/lib/etcd/ # restorecon -Rv /var/lib/etcd/
  Copy to Clipboard Toggle word wrap
- Or, to delete the directory and the etcd, data, run the following command:
  # rm -Rf /var/lib/etcd/*
  Copy to Clipboard Toggle word wrap
  Note
  In an all-in-one cluster, the etcd data directory is located in the /var/lib/origin/openshift.local.etcd directory.

Restore a healthy backup data file to each of the etcd nodes. Perform this step on all etcd hosts, including master hosts collocated with etcd.

cp -R /backup/etcd-xxx/* /var/lib/etcd/
mv /var/lib/etcd/db /var/lib/etcd/member/snap/db
chcon -R --reference /backup/etcd-xxx/* /var/lib/etcd/
chown -R etcd:etcd /var/lib/etcd/R

# cp -R /backup/etcd-xxx/* /var/lib/etcd/
# mv /var/lib/etcd/db /var/lib/etcd/member/snap/db
# chcon -R --reference /backup/etcd-xxx/* /var/lib/etcd/
# chown -R etcd:etcd /var/lib/etcd/R

Copy to Clipboard

Toggle word wrap

Run the etcd service on each host, forcing a new cluster.

This creates a custom file for the etcd service, which overwrites the execution command adding the --force-new-cluster option:

mkdir -p /etc/systemd/system/etcd.service.d/
echo "[Service]" > /etc/systemd/system/etcd.service.d/temp.conf
echo "ExecStart=" >> /etc/systemd/system/etcd.service.d/temp.conf
sed -n '/ExecStart/s/"$/ --force-new-cluster"/p' \
    /usr/lib/systemd/system/etcd.service \
    >> /etc/systemd/system/etcd.service.d/temp.conf
systemctl daemon-reload
systemctl restart etcd

# mkdir -p /etc/systemd/system/etcd.service.d/
# echo "[Service]" > /etc/systemd/system/etcd.service.d/temp.conf
# echo "ExecStart=" >> /etc/systemd/system/etcd.service.d/temp.conf
# sed -n '/ExecStart/s/"$/ --force-new-cluster"/p' \
    /usr/lib/systemd/system/etcd.service \
    >> /etc/systemd/system/etcd.service.d/temp.conf

# systemctl daemon-reload
# systemctl restart etcd

Copy to Clipboard

Toggle word wrap

Check for error messages:
```
journalctl -fu etcd.service
```
```
$ journalctl -fu etcd.service
```
Copy to Clipboard Toggle word wrap

Check for health status:

etcdctl2 cluster-health

# etcdctl2 cluster-health
member 5ee217d17301 is healthy: got healthy result from https://192.168.55.8:2379
cluster is healthy

Copy to Clipboard

Toggle word wrap

Restart the etcd service in cluster mode:

rm -f /etc/systemd/system/etcd.service.d/temp.conf
systemctl daemon-reload
systemctl restart etcd

# rm -f /etc/systemd/system/etcd.service.d/temp.conf
# systemctl daemon-reload
# systemctl restart etcd

Copy to Clipboard

Toggle word wrap

Check for health status and member list:

etcdctl2 cluster-health
etcdctl2 member list

# etcdctl2 cluster-health
member 5ee217d17301 is healthy: got healthy result from https://192.168.55.8:2379
cluster is healthy

# etcdctl2 member list
5ee217d17301: name=master-0.example.com peerURLs=http://localhost:2380 clientURLs=https://192.168.55.8:2379 isLeader=true

Copy to Clipboard

Toggle word wrap

After the first instance is running, you can restore the rest of your etcd servers.

5.4.2.1.1. Fix the peerURLS parameter
Link kopieren

After restoring the data and creating a new cluster, the peerURLs parameter shows localhost instead of the IP where etcd is listening for peer communication:

etcdctl2 member list

# etcdctl2 member list
5ee217d17301: name=master-0.example.com peerURLs=http://*localhost*:2380 clientURLs=https://192.168.55.8:2379 isLeader=true

Copy to Clipboard

Toggle word wrap

5.4.2.1.1.1. Procedure
Link kopieren

Get the member ID using etcdctl member list:
```
`etcdctl member list`
```
```
`etcdctl member list`
```
Copy to Clipboard Toggle word wrap
Get the IP where etcd listens for peer communication:
```
ss -l4n | grep 2380
```
```
$ ss -l4n | grep 2380
```
Copy to Clipboard Toggle word wrap

Update the member information with that IP:

etcdctl2 member update 5ee217d17301 https://192.168.55.8:2380

# etcdctl2 member update 5ee217d17301 https://192.168.55.8:2380
Updated member with ID 5ee217d17301 in cluster

Copy to Clipboard

Toggle word wrap

To verify, check that the IP is in the member list:

etcdctl2 member list

$ etcdctl2 member list
5ee217d17301: name=master-0.example.com peerURLs=https://*192.168.55.8*:2380 clientURLs=https://192.168.55.8:2379 isLeader=true

Copy to Clipboard

Toggle word wrap

5.4.2.2. Restoring etcd for v3
Link kopieren

The restore procedure for v3 data is similar to the restore procedure for the v2 data.

Snapshot integrity may be optionally verified at restore time. If the snapshot is taken with etcdctl snapshot save, it will have an integrity hash that is checked by etcdctl snapshot restore. If the snapshot is copied from the data directory, there is no integrity hash and it will only restore by using --skip-hash-check.

Important

The procedure to restore only the v3 data must be performed on a single etcd host. You can then add the rest of the nodes to the cluster.

Procedure

Stop all etcd services:
```
systemctl stop etcd.service
```
```
# systemctl stop etcd.service
```
Copy to Clipboard Toggle word wrap
Clear all old data, because etcdctl recreates it in the node where the restore procedure is going to be performed:
```
rm -Rf /var/lib/etcd
```
```
# rm -Rf /var/lib/etcd
```
Copy to Clipboard Toggle word wrap

Run the snapshot restore command, substituting the values from the /etc/etcd/etcd.conf file:

etcdctl3 snapshot restore /backup/etcd-xxxxxx/backup.db \
  --data-dir /var/lib/etcd \
  --name master-0.example.com \
  --initial-cluster "master-0.example.com=https://192.168.55.8:2380" \ --initial-cluster-token "etcd-cluster-1" \
  --initial-advertise-peer-urls https://192.168.55.8:2380

# etcdctl3 snapshot restore /backup/etcd-xxxxxx/backup.db \
  --data-dir /var/lib/etcd \
  --name master-0.example.com \
  --initial-cluster "master-0.example.com=https://192.168.55.8:2380" \ --initial-cluster-token "etcd-cluster-1" \
  --initial-advertise-peer-urls https://192.168.55.8:2380

2017-10-03 08:55:32.440779 I | mvcc: restore compact to 1041269
2017-10-03 08:55:32.468244 I | etcdserver/membership: added member 40bef1f6c79b3163 [https://192.168.55.8:2380] to cluster 26841ebcf610583c

Copy to Clipboard

Toggle word wrap

Restore permissions and selinux context to the restored files:

chown -R etcd.etcd /var/lib/etcd/
restorecon -Rv /var/lib/etcd

# chown -R etcd.etcd /var/lib/etcd/
# restorecon -Rv /var/lib/etcd

Copy to Clipboard

Toggle word wrap

Start the etcd service:
```
systemctl start etcd
```
```
# systemctl start etcd
```
Copy to Clipboard Toggle word wrap
Check for any error messages:
```
journalctl -fu etcd.service
```
```
$ journalctl -fu etcd.service
```
Copy to Clipboard Toggle word wrap

5.4.3. Replacing an etcd host
Link kopieren

To replace an etcd host, scale up the etcd cluster and then remove the host. This process ensures that you keep quorum if you lose an etcd host during the replacement procedure.

Warning

The etcd cluster must maintain a quorum during the replacement operation. This means that at least one host must be in operation at all times.

If the host replacement operation occurs while the etcd cluster maintains a quorum, cluster operations are usually not affected. If a large amount of etcd data must replicate, some operations might slow down.

Note

Before you start any procedure involving the etcd cluster, you must have a backup of the etcd data and configuration files so that you can restore the cluster if the procedure fails.

5.4.4. Scaling etcd
Link kopieren

You can scale the etcd cluster vertically by adding more resources to the etcd hosts or horizontally by adding more etcd hosts.

Note

Due to the voting system etcd uses, the cluster must always contain an odd number of members.

Having a cluster with an odd number of etcd hosts can account for fault tolerance. Having an odd number of etcd hosts does not change the number needed for a quorum but increases the tolerance for failure. For example, with a cluster of three members, quorum is two, which leaves a failure tolerance of one. This ensures the cluster continues to operate if two of the members are healthy.

Having an in-production cluster of three etcd hosts is recommended.

The new host requires a fresh Red Hat Enterprise Linux version 7 dedicated host. The etcd storage should be located on an SSD disk to achieve maximum performance and on a dedicated disk mounted in /var/lib/etcd.

Prerequisites

Before you add a new etcd host, perform a backup of both etcd configuration and data to prevent data loss.

Check the current etcd cluster status to avoid adding new hosts to an unhealthy cluster.

If you use the v2 etcd api, run this command:

etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://*master-0.example.com*:2379,\
          https://*master-1.example.com*:2379,\
          https://*master-2.example.com*:2379"\
          cluster-health

# etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://*master-0.example.com*:2379,\
          https://*master-1.example.com*:2379,\
          https://*master-2.example.com*:2379"\
          cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

Copy to Clipboard

Toggle word wrap

If you use the v3 etcd api, run this command:

ETCDCTL_API=3 etcdctl --cert="/etc/etcd/peer.crt" \
          --key=/etc/etcd/peer.key \
          --cacert="/etc/etcd/ca.crt" \
          --endpoints="https://*master-0.example.com*:2379,\
            https://*master-1.example.com*:2379,\
            https://*master-2.example.com*:2379"

# ETCDCTL_API=3 etcdctl --cert="/etc/etcd/peer.crt" \
          --key=/etc/etcd/peer.key \
          --cacert="/etc/etcd/ca.crt" \
          --endpoints="https://*master-0.example.com*:2379,\
            https://*master-1.example.com*:2379,\
            https://*master-2.example.com*:2379"
            endpoint health
https://master-0.example.com:2379 is healthy: successfully committed proposal: took = 5.011358ms
https://master-1.example.com:2379 is healthy: successfully committed proposal: took = 1.305173ms
https://master-2.example.com:2379 is healthy: successfully committed proposal: took = 1.388772ms

Copy to Clipboard

Toggle word wrap

Before running the scaleup playbook, ensure the new host is registered to the proper Red Hat software channels:

subscription-manager register \
    --username=*<username>* --password=*<password>*
subscription-manager attach --pool=*<poolid>*
subscription-manager repos --disable="*"
subscription-manager repos \
    --enable=rhel-7-server-rpms \
    --enable=rhel-7-server-extras-rpms

# subscription-manager register \
    --username=*<username>* --password=*<password>*
# subscription-manager attach --pool=*<poolid>*
# subscription-manager repos --disable="*"
# subscription-manager repos \
    --enable=rhel-7-server-rpms \
    --enable=rhel-7-server-extras-rpms

Copy to Clipboard

Toggle word wrap

etcd is hosted in the rhel-7-server-extras-rpms software channel.

Upgrade etcd and iptables on the current etcd nodes:
```
yum update etcd iptables-services
```
```
# yum update etcd iptables-services
```
Copy to Clipboard Toggle word wrap
Back up the /etc/etcd configuration for the etcd hosts.
If the new etcd members will also be OpenShift Container Platform nodes, add the desired number of hosts to the cluster.
The rest of this procedure assumes you added one host, but if you add multiple hosts, perform all steps on each host.

5.4.4.1. Adding a new etcd host using Ansible
Link kopieren

Procedure

In the Ansible inventory file, create a new group named [new_etcd] and add the new host. Then, add the new_etcd group as a child of the [OSEv3] group:

[OSEv3:children]
masters
nodes
etcd
new_etcd 

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com
master-2.example.com

[new_etcd] 
etcd0.example.com

[OSEv3:children]
masters
nodes
etcd
new_etcd



... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com
master-2.example.com

[new_etcd]


etcd0.example.com

Copy to Clipboard

Toggle word wrap

1 2 3: Add these lines.

From the host that installed OpenShift Container Platform and hosts the Ansible inventory file, run the etcd scaleup playbook:

ansible-playbook  /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/scaleup.yml

$ ansible-playbook  /usr/share/ansible/openshift-ansible/playbooks/openshift-etcd/scaleup.yml

Copy to Clipboard

Toggle word wrap

After the playbook runs, modify the inventory file to reflect the current status by moving the new etcd host from the [new_etcd] group to the [etcd] group:

[OSEv3:children]
masters
nodes
etcd
new_etcd

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com
master-2.example.com
etcd0.example.com

[OSEv3:children]
masters
nodes
etcd
new_etcd

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com
master-2.example.com
etcd0.example.com

Copy to Clipboard

Toggle word wrap

If you use the service catalog, you must update its list of etcd servers:
```
oc edit ds apiserver -n kube-service-catalog
```
```
$ oc edit ds apiserver -n kube-service-catalog
```
Copy to Clipboard Toggle word wrap
Add the FQDN for the new etcd node to the --etcd-servers argument. This argument contains a comma-separated list.

If you use Flannel, modify the flanneld service configuration on every OpenShift Container Platform host, located at /etc/sysconfig/flanneld, to include the new etcd host:

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379

Copy to Clipboard

Toggle word wrap

Restart the flanneld service:
```
systemctl restart flanneld.service
```
```
# systemctl restart flanneld.service
```
Copy to Clipboard Toggle word wrap

5.4.4.2. Manually adding a new etcd host
Link kopieren

Procedure

Modify the current etcd cluster

To create the etcd certificates, run the openssl command, replacing the values with those from your environment.

Create some environment variables:

export NEW_ETCD_HOSTNAME="*etcd0.example.com*"
export NEW_ETCD_IP="192.168.55.21"

export CN=$NEW_ETCD_HOSTNAME
export SAN="IP:${NEW_ETCD_IP}, DNS:${NEW_ETCD_HOSTNAME}"
export PREFIX="/etc/etcd/generated_certs/etcd-$CN/"
export OPENSSLCFG="/etc/etcd/ca/openssl.cnf"

export NEW_ETCD_HOSTNAME="*etcd0.example.com*"
export NEW_ETCD_IP="192.168.55.21"

export CN=$NEW_ETCD_HOSTNAME
export SAN="IP:${NEW_ETCD_IP}, DNS:${NEW_ETCD_HOSTNAME}"
export PREFIX="/etc/etcd/generated_certs/etcd-$CN/"
export OPENSSLCFG="/etc/etcd/ca/openssl.cnf"

Copy to Clipboard

Toggle word wrap

Note

The custom openssl extensions used as etcd_v3_ca_* include the $SAN environment variable as subjectAltName. See /etc/etcd/ca/openssl.cnf for more information.

Create the directory to store the configuration and certificates:
```
mkdir -p ${PREFIX}
```
```
# mkdir -p ${PREFIX}
```
Copy to Clipboard Toggle word wrap

Create the server certificate request and sign it: (server.csr and server.crt)

openssl req -new -config ${OPENSSLCFG} \
    -keyout ${PREFIX}server.key  \
    -out ${PREFIX}server.csr \
    -reqexts etcd_v3_req -batch -nodes \
    -subj /CN=$CN
openssl ca -name etcd_ca -config ${OPENSSLCFG} \
    -out ${PREFIX}server.crt \
    -in ${PREFIX}server.csr \
    -extensions etcd_v3_ca_server -batch

# openssl req -new -config ${OPENSSLCFG} \
    -keyout ${PREFIX}server.key  \
    -out ${PREFIX}server.csr \
    -reqexts etcd_v3_req -batch -nodes \
    -subj /CN=$CN

# openssl ca -name etcd_ca -config ${OPENSSLCFG} \
    -out ${PREFIX}server.crt \
    -in ${PREFIX}server.csr \
    -extensions etcd_v3_ca_server -batch

Copy to Clipboard

Toggle word wrap

Create the peer certificate request and sign it: (peer.csr and peer.crt)

openssl req -new -config ${OPENSSLCFG} \
    -keyout ${PREFIX}peer.key \
    -out ${PREFIX}peer.csr \
    -reqexts etcd_v3_req -batch -nodes \
    -subj /CN=$CN
openssl ca -name etcd_ca -config ${OPENSSLCFG} \
  -out ${PREFIX}peer.crt \
  -in ${PREFIX}peer.csr \
  -extensions etcd_v3_ca_peer -batch

# openssl req -new -config ${OPENSSLCFG} \
    -keyout ${PREFIX}peer.key \
    -out ${PREFIX}peer.csr \
    -reqexts etcd_v3_req -batch -nodes \
    -subj /CN=$CN

# openssl ca -name etcd_ca -config ${OPENSSLCFG} \
  -out ${PREFIX}peer.crt \
  -in ${PREFIX}peer.csr \
  -extensions etcd_v3_ca_peer -batch

Copy to Clipboard

Toggle word wrap

Copy the current etcd configuration and ca.crt files from the current node as examples to modify later:
```
cp /etc/etcd/etcd.conf ${PREFIX}
cp /etc/etcd/ca.crt ${PREFIX}
```
```
# cp /etc/etcd/etcd.conf ${PREFIX}
# cp /etc/etcd/ca.crt ${PREFIX}
```
Copy to Clipboard Toggle word wrap

While still on the surviving etcd host, add the new host to the cluster. To add additional etcd members to the cluster, you must first adjust the default localhost peer in the peerURLs value for the first member:

Get the member ID for the first member using the member list command:

etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
    member list

# etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \


    member list

Copy to Clipboard

Toggle word wrap

1: Ensure that you specify the URLs of only active etcd members in the --peers parameter value.

Obtain the IP address where etcd listens for cluster peers:
```
ss -l4n | grep 2380
```
```
$ ss -l4n | grep 2380
```
Copy to Clipboard Toggle word wrap

Update the value of peerURLs using the etcdctl member update command by passing the member ID and IP address obtained from the previous steps:

etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
    member update 511b7fb6cc0001 https://172.18.1.18:2380

# etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
    member update 511b7fb6cc0001 https://172.18.1.18:2380

Copy to Clipboard

Toggle word wrap

Re-run the member list command and ensure the peer URLs no longer include localhost.

Add the new host to the etcd cluster. Note that the new host is not yet configured, so the status stays as unstarted until the you configure the new host.

Warning

You must add each member and bring it online one at a time. When you add each additional member to the cluster, you must adjust the peerURLs list for the current peers. The peerURLs list grows by one for each member added. The etcdctl member add command outputs the values that you must set in the etcd.conf file as you add each member, as described in the following instructions.

etcdctl -C https://${CURRENT_ETCD_HOST}:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member add ${NEW_ETCD_HOSTNAME} https://${NEW_ETCD_IP}:2380

# etcdctl -C https://${CURRENT_ETCD_HOST}:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member add ${NEW_ETCD_HOSTNAME} https://${NEW_ETCD_IP}:2380



Added member named 10.3.9.222 with ID 4e1db163a21d7651 to cluster

ETCD_NAME="<NEW_ETCD_HOSTNAME>"
ETCD_INITIAL_CLUSTER="<NEW_ETCD_HOSTNAME>=https://<NEW_HOST_IP>:2380,<CLUSTERMEMBER1_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER2_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER3_NAME>=https:/<CLUSTERMEMBER3_IP>:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

Copy to Clipboard

Toggle word wrap

1: In this line, 10.3.9.222 is a label for the etcd member. You can specify the host name, IP address, or a simple name.

Update the sample ${PREFIX}/etcd.conf file.
1. Replace the following values with the values generated in the previous step:
  - ETCD_NAME
  - ETCD_INITIAL_CLUSTER
  - ETCD_INITIAL_CLUSTER_STATE
2. Modify the following variables with the new host IP from the output of the previous step. You can use ${NEW_ETCD_IP} as the value.
  ETCD_LISTEN_PEER_URLS ETCD_LISTEN_CLIENT_URLS ETCD_INITIAL_ADVERTISE_PEER_URLS ETCD_ADVERTISE_CLIENT_URLS
  Copy to Clipboard Toggle word wrap
3. If you previously used the member system as an etcd node, you must overwrite the current values in the /etc/etcd/etcd.conf file.
4. Check the file for syntax errors or missing IP addresses, otherwise the etcd service might fail:
  # vi ${PREFIX}/etcd.conf
  Copy to Clipboard Toggle word wrap
On the node that hosts the installation files, update the [etcd] hosts group in the /etc/ansible/hosts inventory file. Remove the old etcd hosts and add the new ones.

Create a tgz file that contains the certificates, the sample configuration file, and the ca and copy it to the new host:

tar -czvf /etc/etcd/generated_certs/${CN}.tgz -C ${PREFIX} .
scp /etc/etcd/generated_certs/${CN}.tgz ${CN}:/tmp/

# tar -czvf /etc/etcd/generated_certs/${CN}.tgz -C ${PREFIX} .
# scp /etc/etcd/generated_certs/${CN}.tgz ${CN}:/tmp/

Copy to Clipboard

Toggle word wrap

Modify the new etcd host

Install iptables-services to provide iptables utilities to open the required ports for etcd:
```
yum install -y iptables-services
```
```
# yum install -y iptables-services
```
Copy to Clipboard Toggle word wrap

Create the OS_FIREWALL_ALLOW firewall rules to allow etcd to communicate:

Port 2379/tcp for clients

Port 2380/tcp for peer communication

systemctl enable iptables.service --now
iptables -N OS_FIREWALL_ALLOW
iptables -t filter -I INPUT -j OS_FIREWALL_ALLOW
iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2379 -j ACCEPT
iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2380 -j ACCEPT
iptables-save | tee /etc/sysconfig/iptables

# systemctl enable iptables.service --now
# iptables -N OS_FIREWALL_ALLOW
# iptables -t filter -I INPUT -j OS_FIREWALL_ALLOW
# iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2379 -j ACCEPT
# iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2380 -j ACCEPT
# iptables-save | tee /etc/sysconfig/iptables

Copy to Clipboard

Toggle word wrap

Note

In this example, a new chain OS_FIREWALL_ALLOW is created, which is the standard naming the OpenShift Container Platform installer uses for firewall rules.

Warning

If the environment is hosted in an IaaS environment, modify the security groups for the instance to allow incoming traffic to those ports as well.

Install etcd:
```
yum install -y etcd
```
```
# yum install -y etcd
```
Copy to Clipboard Toggle word wrap
Ensure version etcd-2.3.7-4.el7.x86_64 or greater is installed,
Ensure the etcd service is not running:
```
systemctl disable etcd --now
```
```
# systemctl disable etcd --now
```
Copy to Clipboard Toggle word wrap
Remove any etcd configuration and data:
```
rm -Rf /etc/etcd/*
rm -Rf /var/lib/etcd/*
```
```
# rm -Rf /etc/etcd/*
# rm -Rf /var/lib/etcd/*
```
Copy to Clipboard Toggle word wrap
Extract the certificates and configuration files:
```
tar xzvf /tmp/etcd0.example.com.tgz -C /etc/etcd/
```
```
# tar xzvf /tmp/etcd0.example.com.tgz -C /etc/etcd/
```
Copy to Clipboard Toggle word wrap

Modify the file ownership permissions:

chown -R etcd:etcd /etc/etcd/*
chown -R etcd:etcd /var/lib/etcd/

# chown -R etcd:etcd /etc/etcd/*
# chown -R etcd:etcd /var/lib/etcd/

Copy to Clipboard

Toggle word wrap

Start etcd on the new host:
```
systemctl enable etcd --now
```
```
# systemctl enable etcd --now
```
Copy to Clipboard Toggle word wrap

Verify that the host is part of the cluster and the current cluster health:

If you use the v2 etcd api, run the following command:

etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://*master-0.example.com*:2379,\
          https://*master-1.example.com*:2379,\
          https://*master-2.example.com*:2379,\
          https://*etcd0.example.com*:2379"\
          cluster-health

# etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://*master-0.example.com*:2379,\
          https://*master-1.example.com*:2379,\
          https://*master-2.example.com*:2379,\
          https://*etcd0.example.com*:2379"\
          cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
member 8b8904727bf526a5 is healthy: got healthy result from https://192.168.55.21:2379
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

Copy to Clipboard

Toggle word wrap

If you use the v3 etcd api, run the following command:

ETCDCTL_API=3 etcdctl --cert="/etc/etcd/peer.crt" \
          --key=/etc/etcd/peer.key \
          --cacert="/etc/etcd/ca.crt" \
          --endpoints="https://*master-0.example.com*:2379,\
            https://*master-1.example.com*:2379,\
            https://*master-2.example.com*:2379,\
            https://*etcd0.example.com*:2379"\
            endpoint health

# ETCDCTL_API=3 etcdctl --cert="/etc/etcd/peer.crt" \
          --key=/etc/etcd/peer.key \
          --cacert="/etc/etcd/ca.crt" \
          --endpoints="https://*master-0.example.com*:2379,\
            https://*master-1.example.com*:2379,\
            https://*master-2.example.com*:2379,\
            https://*etcd0.example.com*:2379"\
            endpoint health
https://master-0.example.com:2379 is healthy: successfully committed proposal: took = 5.011358ms
https://master-1.example.com:2379 is healthy: successfully committed proposal: took = 1.305173ms
https://master-2.example.com:2379 is healthy: successfully committed proposal: took = 1.388772ms
https://etcd0.example.com:2379 is healthy: successfully committed proposal: took = 1.498829ms

Copy to Clipboard

Toggle word wrap

Modify each OpenShift Container Platform master

Modify the master configuration in the etcClientInfo section of the /etc/origin/master/master-config.yaml file on every master. Add the new etcd host to the list of the etcd servers OpenShift Container Platform uses to store the data, and remove any failed etcd hosts:

etcdClientInfo:
  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
    - https://master-0.example.com:2379
    - https://master-1.example.com:2379
    - https://master-2.example.com:2379
    - https://etcd0.example.com:2379

etcdClientInfo:
  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
    - https://master-0.example.com:2379
    - https://master-1.example.com:2379
    - https://master-2.example.com:2379
    - https://etcd0.example.com:2379

Copy to Clipboard

Toggle word wrap

Restart the master API service:
- On every master:
  # systemctl restart atomic-openshift-master-api
  Copy to Clipboard Toggle word wrap
- Or, on a single master cluster installation:
  # systemctl restart atomic-openshift-master
  Copy to Clipboard Toggle word wrap
  Warning
  The number of etcd nodes must be odd, so you must add at least two hosts.

If you use Flannel, modify the flanneld service configuration located at /etc/sysconfig/flanneld on every OpenShift Container Platform host to include the new etcd host:

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379

Copy to Clipboard

Toggle word wrap

Restart the flanneld service:
```
systemctl restart flanneld.service
```
```
# systemctl restart flanneld.service
```
Copy to Clipboard Toggle word wrap

5.4.5. Removing an etcd host
Link kopieren

If an etcd host fails beyond restoration, remove it from the cluster.

Steps to be performed on all masters hosts

Procedure

Remove each other etcd host from the etcd cluster. Run the following command for each etcd node:

etcdctl -C https://<surviving host IP address>:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member remove <failed member ID>

# etcdctl -C https://<surviving host IP address>:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member remove <failed member ID>

Copy to Clipboard

Toggle word wrap

Restart the master API service on every master:
```
systemctl restart atomic-openshift-master-api
```
```
# systemctl restart atomic-openshift-master-api
```
Copy to Clipboard Toggle word wrap
Or, if using a single master cluster installation:
```
systemctl restart atomic-openshift-master
```
```
# systemctl restart atomic-openshift-master
```
Copy to Clipboard Toggle word wrap

Steps to be performed in the current etcd cluster

Procedure

Remove the failed host from the cluster:

etcdctl2 cluster-health
etcdctl2 member remove 8372784203e11288
etcdctl2 cluster-health

# etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
failed to check the health of member 8372784203e11288 on https://192.168.55.21:2379: Get https://192.168.55.21:2379/health: dial tcp 192.168.55.21:2379: getsockopt: connection refused
member 8372784203e11288 is unreachable: [https://192.168.55.21:2379] are all unreachable
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

# etcdctl2 member remove 8372784203e11288


Removed member 8372784203e11288 from cluster

# etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

Copy to Clipboard

Toggle word wrap

1: The remove command requires the etcd ID, not the hostname.

vi /etc/etcd/etcd.conf

# vi /etc/etcd/etcd.conf

Copy to Clipboard

Toggle word wrap

For example:

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380,master-2.example.com=https://192.168.55.13:2380

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380,master-2.example.com=https://192.168.55.13:2380

Copy to Clipboard

Toggle word wrap

becomes:

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380

Copy to Clipboard

Toggle word wrap

Note

Restarting the etcd services is not required, because the failed host is removed using etcdctl.

Modify the Ansible inventory file to reflect the current status of the cluster and to avoid issues when re-running a playbook:

[OSEv3:children]
masters
nodes
etcd

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com

[OSEv3:children]
masters
nodes
etcd

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com

Copy to Clipboard

Toggle word wrap

If you are using Flannel, modify the flanneld service configuration located at /etc/sysconfig/flanneld on every host and remove the etcd host:

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379

Copy to Clipboard

Toggle word wrap

Restart the flanneld service:
```
systemctl restart flanneld.service
```
```
# systemctl restart flanneld.service
```
Copy to Clipboard Toggle word wrap

Nach oben

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 5. Host-level tasks

5.1. Adding a host to the clusterLink kopierenLink in die Zwischenablage kopiert!

5.2. Master host tasksLink kopierenLink in die Zwischenablage kopiert!

5.2.1. Deprecating a master hostLink kopierenLink in die Zwischenablage kopiert!

5.2.1.1. Creating a master host backupLink kopierenLink in die Zwischenablage kopiert!

Procedure

5.2.1.2. Backing up etcdLink kopierenLink in die Zwischenablage kopiert!

5.2.1.2.1. Backing up etcd configuration filesLink kopierenLink in die Zwischenablage kopiert!

Procedure

5.2.1.2.2. Backing up etcd dataLink kopierenLink in die Zwischenablage kopiert!

Prerequisites

Procedure

5.2.1.3. Deprecating a master hostLink kopierenLink in die Zwischenablage kopiert!

Procedure

5.2.1.4. Removing an etcd hostLink kopierenLink in die Zwischenablage kopiert!

Procedure

Procedure

5.2.2. Creating a master host backupLink kopierenLink in die Zwischenablage kopiert!

Procedure

5.2.3. Restoring a master host backupLink kopierenLink in die Zwischenablage kopiert!

Procedure

5.3. Node host tasksLink kopierenLink in die Zwischenablage kopiert!

5.3.1. Deprecating a node hostLink kopierenLink in die Zwischenablage kopiert!

Prerequisites

Procedure

5.3.1.1. Replacing a node hostLink kopierenLink in die Zwischenablage kopiert!

5.3.2. Creating a node host backupLink kopierenLink in die Zwischenablage kopiert!

Procedure

5.3.3. Restoring a node host backupLink kopierenLink in die Zwischenablage kopiert!

Procedure

5.3.4. Node maintenance and next stepsLink kopierenLink in die Zwischenablage kopiert!

5.4. etcd tasksLink kopierenLink in die Zwischenablage kopiert!

5.4.1. etcd backupLink kopierenLink in die Zwischenablage kopiert!

5.4.1.1. Backing up etcdLink kopierenLink in die Zwischenablage kopiert!

5.4.1.1.1. Backing up etcd configuration filesLink kopierenLink in die Zwischenablage kopiert!

Procedure

5.4.1.1.2. Backing up etcd dataLink kopierenLink in die Zwischenablage kopiert!

Prerequisites

Procedure

5.4.2. Restoring etcdLink kopierenLink in die Zwischenablage kopiert!

5.4.2.1. Restoring etcd v2 & v3 dataLink kopierenLink in die Zwischenablage kopiert!

Procedure

5.4.2.1.1. Fix the peerURLS parameterLink kopierenLink in die Zwischenablage kopiert!

5.4.2.1.1.1. ProcedureLink kopierenLink in die Zwischenablage kopiert!

5.4.2.2. Restoring etcd for v3Link kopierenLink in die Zwischenablage kopiert!

Procedure

5.4.3. Replacing an etcd hostLink kopierenLink in die Zwischenablage kopiert!

5.4.4. Scaling etcdLink kopierenLink in die Zwischenablage kopiert!

Prerequisites

5.4.4.1. Adding a new etcd host using AnsibleLink kopierenLink in die Zwischenablage kopiert!

Procedure

5.4.4.2. Manually adding a new etcd hostLink kopierenLink in die Zwischenablage kopiert!

Procedure

Modify the current etcd cluster

Modify the new etcd host

Modify each OpenShift Container Platform master

5.4.5. Removing an etcd hostLink kopierenLink in die Zwischenablage kopiert!

Procedure

Procedure

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

5.1. Adding a host to the cluster
Link kopieren

5.2. Master host tasks
Link kopieren

5.2.1. Deprecating a master host
Link kopieren

5.2.1.1. Creating a master host backup
Link kopieren

5.2.1.2. Backing up etcd
Link kopieren

5.2.1.2.1. Backing up etcd configuration files
Link kopieren

5.2.1.2.2. Backing up etcd data
Link kopieren

5.2.1.3. Deprecating a master host
Link kopieren

5.2.1.4. Removing an etcd host
Link kopieren

5.2.2. Creating a master host backup
Link kopieren

5.2.3. Restoring a master host backup
Link kopieren

5.3. Node host tasks
Link kopieren

5.3.1. Deprecating a node host
Link kopieren

5.3.1.1. Replacing a node host
Link kopieren

5.3.2. Creating a node host backup
Link kopieren

5.3.3. Restoring a node host backup
Link kopieren

5.3.4. Node maintenance and next steps
Link kopieren

5.4. etcd tasks
Link kopieren

5.4.1. etcd backup
Link kopieren

5.4.1.1. Backing up etcd
Link kopieren

5.4.1.1.1. Backing up etcd configuration files
Link kopieren

5.4.1.1.2. Backing up etcd data
Link kopieren

5.4.2. Restoring etcd
Link kopieren

5.4.2.1. Restoring etcd v2 & v3 data
Link kopieren

5.4.2.1.1. Fix the peerURLS parameter
Link kopieren

5.4.2.1.1.1. Procedure
Link kopieren

5.4.2.2. Restoring etcd for v3
Link kopieren

5.4.3. Replacing an etcd host
Link kopieren

5.4.4. Scaling etcd
Link kopieren

5.4.4.1. Adding a new etcd host using Ansible
Link kopieren

5.4.4.2. Manually adding a new etcd host
Link kopieren

5.4.5. Removing an etcd host
Link kopieren