Hosted control planes
Using hosted control planes with OpenShift Container Platform
Abstract
Chapter 1. Hosted control planes overview
You can deploy OpenShift Container Platform clusters by using two different control plane configurations: standalone or hosted control planes. The standalone configuration uses dedicated virtual machines or physical machines to host the control plane. With hosted control planes for OpenShift Container Platform, you create control planes as pods on a hosting cluster without the need for dedicated virtual or physical machines for each control plane.
1.1. Glossary of common concepts and personas for hosted control planes
When you use hosted control planes for OpenShift Container Platform, it is important to understand its key concepts and the personas that are involved.
1.1.1. Concepts
- hosted cluster
- An OpenShift Container Platform cluster with its control plane and API endpoint hosted on a management cluster. The hosted cluster includes the control plane and its corresponding data plane.
- hosted cluster infrastructure
- Network, compute, and storage resources that exist in the tenant or end-user cloud account.
- hosted control plane
- An OpenShift Container Platform control plane that runs on the management cluster, which is exposed by the API endpoint of a hosted cluster. The components of a control plane include etcd, the Kubernetes API server, the Kubernetes controller manager, and a VPN.
- hosting cluster
- See management cluster.
- managed cluster
- A cluster that the hub cluster manages. This term is specific to the cluster lifecycle that the multicluster engine for Kubernetes Operator manages in Red Hat Advanced Cluster Management. A managed cluster is not the same thing as a management cluster. For more information, see Managed cluster.
- management cluster
- An OpenShift Container Platform cluster where the HyperShift Operator is deployed and where the control planes for hosted clusters are hosted. The management cluster is synonymous with the hosting cluster.
- management cluster infrastructure
- Network, compute, and storage resources of the management cluster.
- node pool
- A resource that contains the compute nodes. The control plane contains node pools. The compute nodes run applications and workloads.
1.1.2. Personas
- cluster instance administrator
- 
								Users who assume this role are the equivalent of administrators in standalone OpenShift Container Platform. This user has the cluster-adminrole in the provisioned cluster, but might not have power over when or how the cluster is updated or configured. This user might have read-only access to see some configuration projected into the cluster.
- cluster instance user
- Users who assume this role are the equivalent of developers in standalone OpenShift Container Platform. This user does not have a view into OperatorHub or machines.
- cluster service consumer
- Users who assume this role can request control planes and worker nodes, drive updates, or modify externalized configurations. Typically, this user does not manage or access cloud credentials or infrastructure encryption keys. The cluster service consumer persona can request hosted clusters and interact with node pools. Users who assume this role have RBAC to create, read, update, or delete hosted clusters and node pools within a logical boundary.
- cluster service provider
- Users who assume this role typically have the - cluster-adminrole on the management cluster and have RBAC to monitor and own the availability of the HyperShift Operator as well as the control planes for the tenant’s hosted clusters. The cluster service provider persona is responsible for several activities, including the following examples:- Owning service-level objects for control plane availability, uptime, and stability
- Configuring the cloud account for the management cluster to host control planes
- Configuring the user-provisioned infrastructure, which includes the host awareness of available compute resources
 
1.2. Introduction to hosted control planes
You can use hosted control planes for Red Hat OpenShift Container Platform to reduce management costs, optimize cluster deployment time, and separate management and workload concerns so that you can focus on your applications.
Hosted control planes is available by using the multicluster engine for Kubernetes Operator version 2.0 or later on the following platforms:
- Bare metal by using the Agent provider
- OpenShift Virtualization
- Amazon Web Services (AWS), as a Technology Preview feature
- IBM Z, as a Technology Preview feature
- IBM Power, as a Technology Preview feature
1.2.1. Architecture of hosted control planes
OpenShift Container Platform is often deployed in a coupled, or standalone, model, where a cluster consists of a control plane and a data plane. The control plane includes an API endpoint, a storage endpoint, a workload scheduler, and an actuator that ensures state. The data plane includes compute, storage, and networking where workloads and applications run.
The standalone control plane is hosted by a dedicated group of nodes, which can be physical or virtual, with a minimum number to ensure quorum. The network stack is shared. Administrator access to a cluster offers visibility into the cluster’s control plane, machine management APIs, and other components that contribute to the state of a cluster.
Although the standalone model works well, some situations require an architecture where the control plane and data plane are decoupled. In those cases, the data plane is on a separate network domain with a dedicated physical hosting environment. The control plane is hosted by using high-level primitives such as deployments and stateful sets that are native to Kubernetes. The control plane is treated as any other workload.
1.2.2. Benefits of hosted control planes
With hosted control planes for OpenShift Container Platform, you can pave the way for a true hybrid-cloud approach and enjoy several other benefits.
- The security boundaries between management and workloads are stronger because the control plane is decoupled and hosted on a dedicated hosting service cluster. As a result, you are less likely to leak credentials for clusters to other users. Because infrastructure secret account management is also decoupled, cluster infrastructure administrators cannot accidentally delete control plane infrastructure.
- With hosted control planes, you can run many control planes on fewer nodes. As a result, clusters are more affordable.
- Because the control planes consist of pods that are launched on OpenShift Container Platform, control planes start quickly. The same principles apply to control planes and workloads, such as monitoring, logging, and auto-scaling.
- From an infrastructure perspective, you can push registries, HAProxy, cluster monitoring, storage nodes, and other infrastructure components to the tenant’s cloud provider account, isolating usage to the tenant.
- From an operational perspective, multicluster management is more centralized, which results in fewer external factors that affect the cluster status and consistency. Site reliability engineers have a central place to debug issues and navigate to the cluster data plane, which can lead to shorter Time to Resolution (TTR) and greater productivity.
1.3. Differences between hosted control planes and OpenShift Container Platform
Hosted control planes is a form factor of OpenShift Container Platform. Hosted clusters and the stand-alone OpenShift Container Platform clusters are configured and managed differently. See the following tables to understand the differences between OpenShift Container Platform and hosted control planes:
1.3.1. Cluster creation and lifecycle
| OpenShift Container Platform | Hosted control planes | 
|---|---|
| 
									You install a standalone OpenShift Container Platform cluster by using the  | 
									You install a hosted cluster by using the  | 
1.3.2. Cluster configuration
| OpenShift Container Platform | Hosted control planes | 
|---|---|
| 
									You configure cluster-scoped resources such as authentication, API server, and proxy by using the  | 
									You configure resources that impact the control plane in the  | 
1.3.3. etcd encryption
| OpenShift Container Platform | Hosted control planes | 
|---|---|
| 
									You configure etcd encryption by using the  | 
									You configure etcd encryption by using the  | 
1.3.4. Operators and control plane
| OpenShift Container Platform | Hosted control planes | 
|---|---|
| A standalone OpenShift Container Platform cluster contains separate Operators for each control plane component. | A hosted cluster contains a single Operator named Control Plane Operator that runs in the hosted control plane namespace on the management cluster. | 
| etcd uses storage that is mounted on the control plane nodes. The etcd cluster Operator manages etcd. | etcd uses a persistent volume claim for storage and is managed by the Control Plane Operator. | 
| The Ingress Operator, network related Operators, and {olm-first} run on the cluster. | The Ingress Operator, network related Operators, and {olm-first} run in the hosted control plane namespace on the management cluster. | 
| The OAuth server runs inside the cluster and is exposed through a route in the cluster. | The OAuth server runs inside the control plane and is exposed through a route, node port, or load balancer on the management cluster. | 
1.3.5. Updates
| OpenShift Container Platform | Hosted control planes | 
|---|---|
| 
									The Cluster Version Operator (CVO) orchestrates the update process and monitors the  | 
									The hosted control planes update results in a change to the  | 
| After you update an OpenShift Container Platform cluster, both the control plane and compute machines are updated. | After you update the hosted cluster, only the control plane is updated. You perform node pool updates separately. | 
1.3.6. Machine configuration and management
| OpenShift Container Platform | Hosted control planes | 
|---|---|
| 
									The  | 
									The  | 
| A set of control plane machines are available. | A set of control plane machines do not exist. | 
| 
									You enable a machine health check by using the  | 
									You enable a machine health check through the  | 
| 
									You enable autoscaling by using the  | 
									You enable autoscaling through the  | 
| Machines and machine sets are exposed in the cluster. | Machines, machine sets, and machine deployments from upstream Cluster CAPI Operator are used to manage machines but are not exposed to the user. | 
| All machine sets are upgraded automatically when you update the cluster. | You update your node pools independently from the hosted cluster updates. | 
| Only an in-place upgrade is supported in the cluster. | Both replace and in-place upgrades are supported in the hosted cluster. | 
| The Machine Config Operator manages configurations for machines. | The Machine Config Operator does not exist in hosted control planes. | 
| 
									You configure machine Ignition by using the  | 
									You configure the  | 
| The Machine Config Daemon (MCD) manages configuration changes and updates on each of the nodes. | For an in-place upgrade, the node pool controller creates a run-once pod that updates a machine based on your configuration. | 
| You can modify the machine configuration resources such as the SR-IOV Operator. | You cannot modify the machine configuration resources. | 
1.3.7. Networking
| OpenShift Container Platform | Hosted control planes | 
|---|---|
| The Kube API server communicates with nodes directly, because the Kube API server and nodes exist in the same Virtual Private Cloud (VPC). | The Kube API server communicates with nodes through Konnectivity. The Kube API server and nodes exist in a different Virtual Private Cloud (VPC). | 
| Nodes communicate with the Kube API server through the internal load balancer. | Nodes communicate with the Kube API server through an external load balancer or a node port. | 
1.3.8. Web console
| OpenShift Container Platform | Hosted control planes | 
|---|---|
| The web console shows the status of a control plane. | The web console does not show the status of a control plane. | 
| You can update your cluster by using the web console. | You cannot update the hosted cluster by using the web console. | 
| The web console displays the infrastructure resources such as machines. | The web console does not display the infrastructure resources. | 
| 
									You can configure machines through the  | You cannot configure machines by using the web console. | 
1.4. Relationship between hosted control planes, multicluster engine Operator, and RHACM
You can configure hosted control planes by using the multicluster engine for Kubernetes Operator. The multicluster engine is an integral part of Red Hat Advanced Cluster Management (RHACM) and is enabled by default with RHACM. The multicluster engine Operator cluster lifecycle defines the process of creating, importing, managing, and destroying Kubernetes clusters across various infrastructure cloud providers, private clouds, and on-premises data centers.
The multicluster engine Operator is the cluster lifecycle Operator that provides cluster management capabilities for OpenShift Container Platform and RHACM hub clusters. The multicluster engine Operator enhances cluster fleet management and supports OpenShift Container Platform cluster lifecycle management across clouds and data centers.
Figure 1.1. Cluster life cycle and foundation
You can use the multicluster engine Operator with OpenShift Container Platform as a standalone cluster manager or as part of a RHACM hub cluster.
A management cluster is also known as the hosting cluster.
You can deploy OpenShift Container Platform clusters by using two different control plane configurations: standalone or hosted control planes. The standalone configuration uses dedicated virtual machines or physical machines to host the control plane. With hosted control planes for OpenShift Container Platform, you create control planes as pods on a management cluster without the need for dedicated virtual or physical machines for each control plane.
Figure 1.2. RHACM and the multicluster engine Operator introduction diagram
1.5. Versioning for hosted control planes
With each major, minor, or patch version release of OpenShift Container Platform, two components of hosted control planes are released:
- The HyperShift Operator
- 
						The hcpcommand-line interface (CLI)
				The HyperShift Operator manages the lifecycle of hosted clusters that are represented by the HostedCluster API resources. The HyperShift Operator is released with each OpenShift Container Platform release. The HyperShift Operator creates the supported-versions config map in the hypershift namespace. The config map contains the supported hosted cluster versions.
			
You can host different versions of control planes on the same management cluster.
Example supported-versions config map object
				You can use the hcp CLI to create hosted clusters.
			
				You can use the hypershift.openshift.io API resources, such as, HostedCluster and NodePool, to create and manage OpenShift Container Platform clusters at scale. A HostedCluster resource contains the control plane and common data plane configuration. When you create a HostedCluster resource, you have a fully functional control plane with no attached nodes. A NodePool resource is a scalable set of worker nodes that is attached to a HostedCluster resource.
			
The API version policy generally aligns with the policy for Kubernetes API versioning.
Chapter 2. Getting started with hosted control planes
To get started with hosted control planes for OpenShift Container Platform, you first configure your hosted cluster on the provider that you want to use. Then, you complete a few management tasks.
You can view the procedures by selecting from one of the following providers:
2.1. Bare metal
- Hosted control plane sizing guidance
- Installing the hosted control plane command line interface
- Distributing hosted cluster workloads
- Bare metal firewall and port requirements
- Bare metal infrastructure requirements: Review the infrastructure requirements to create a hosted cluster on bare metal.
- Configuring hosted control plane clusters on bare metal: - Configure DNS
- Create a hosted cluster and verify cluster creation
- 
								Scale the NodePoolobject for the hosted cluster
- Handle ingress traffic for the hosted cluster
- Enable node auto-scaling for the hosted cluster
 
- Configuring hosted control planes in a disconnected environment
- To destroy a hosted cluster on bare metal, follow the instructions in Destroying a hosted cluster on bare metal.
- If you want to disable the hosted control plane feature, see Disabling the hosted control plane feature.
2.2. OpenShift Virtualization
- Hosted control plane sizing guidance
- Installing the hosted control plane command line interface
- Distributing hosted cluster workloads
- Managing hosted control plane clusters on OpenShift Virtualization: Create OpenShift Container Platform clusters with worker nodes that are hosted by KubeVirt virtual machines.
- Configuring hosted control planes in a disconnected environment
- To destroy a hosted cluster is on OpenShift Virtualization, follow the instructions in Destroying a hosted cluster on OpenShift Virtualization.
- If you want to disable the hosted control plane feature, see Disabling the hosted control plane feature.
2.3. Amazon Web Services (AWS)
Hosted control planes on the AWS platform is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
- AWS infrastructure requirements: Review the infrastructure requirements to create a hosted cluster on AWS.
- Configuring hosted control plane clusters on AWS (Technology Preview): The tasks to configure hosted control plane clusters on AWS include creating the AWS S3 OIDC secret, creating a routable public zone, enabling external DNS, enabling AWS PrivateLink, and deploying a hosted cluster.
- Deploying the SR-IOV Operator for hosted control planes: After you configure and deploy your hosting service cluster, you can create a subscription to the Single Root I/O Virtualization (SR-IOV) Operator on a hosted cluster. The SR-IOV pod runs on worker machines rather than the control plane.
- To destroy a hosted cluster on AWS, follow the instructions in Destroying a hosted cluster on AWS.
- If you want to disable the hosted control plane feature, see Disabling the hosted control plane feature.
2.4. IBM Z
Hosted control planes on the IBM Z platform is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
2.5. IBM Power
Hosted control planes on the IBM Power platform is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Chapter 3. Authentication and authorization for hosted control planes
The OpenShift Container Platform control plane includes a built-in OAuth server. You can obtain OAuth access tokens to authenticate to the OpenShift Container Platform API. After you create your hosted cluster, you can configure OAuth by specifying an identity provider.
3.1. Configuring the OAuth server for a hosted cluster by using the CLI
				You can configure the internal OAuth server for your hosted cluster by using an OpenID Connect identity provider (oidc).
			
You can configure OAuth for the following supported identity providers:
- 
						oidc
- 
						htpasswd
- 
						keystone
- 
						ldap
- 
						basic-authentication
- 
						request-header
- 
						github
- 
						gitlab
- 
						google
				Adding any identity provider in the OAuth configuration removes the default kubeadmin user provider.
			
					When you configure identity providers, you must configure at least one NodePool replica in your hosted cluster in advance. Traffic for DNS resolution is sent through the worker nodes. You do not need to configure the NodePool replicas in advance for the htpasswd and request-header identity providers.
				
Prerequisites
- You created your hosted cluster.
Procedure
- Edit the - HostedClustercustom resource (CR) on the hosting cluster by running the following command:- oc edit hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace> - $ oc edit hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Add the OAuth configuration in the - HostedClusterCR by using the following example:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Specifies your hosted cluster name.
- 2
- Specifies your hosted cluster namespace.
- 3
- This provider name is prefixed to the value of the identity claim to form an identity name. The provider name is also used to build the redirect URL.
- 4
- Defines a list of attributes to use as the email address.
- 5
- Defines a list of attributes to use as a display name.
- 6
- Defines a list of attributes to use as a preferred user name.
- 7
- Defines the ID of a client registered with the OpenID provider. You must allow the client to redirect to thehttps://oauth-openshift.apps.<cluster_name>.<cluster_domain>/oauth2callback/<idp_provider_name>URL.
- 8
- Defines a secret of a client registered with the OpenID provider.
- 9
- The Issuer Identifier described in the OpenID spec. You must usehttpswithout query or fragment component.
- 10
- Defines a mapping method that controls how mappings are established between identities of this provider andUserobjects.
 
- Save the file to apply the changes.
3.2. Configuring the OAuth server for a hosted cluster by using the web console
You can configure the internal OAuth server for your hosted cluster by using the OpenShift Container Platform web console.
You can configure OAuth for the following supported identity providers:
- 
						oidc
- 
						htpasswd
- 
						keystone
- 
						ldap
- 
						basic-authentication
- 
						request-header
- 
						github
- 
						gitlab
- 
						google
				Adding any identity provider in the OAuth configuration removes the default kubeadmin user provider.
			
					When you configure identity providers, you must configure at least one NodePool replica in your hosted cluster in advance. Traffic for DNS resolution is sent through the worker nodes. You do not need to configure the NodePool replicas in advance for the htpasswd and request-header identity providers.
				
Prerequisites
- 
						You logged in as a user with cluster-adminprivileges.
- You created your hosted cluster.
Procedure
- Navigate to Home → API Explorer.
- 
						Use the Filter by kind box to search for your HostedClusterresource.
- 
						Click the HostedClusterresource that you want to edit.
- Click the Instances tab.
- 
						Click the Options menu 
						 next to your hosted cluster name entry and click Edit HostedCluster. next to your hosted cluster name entry and click Edit HostedCluster.
- Add the OAuth configuration in the YAML file: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- This provider name is prefixed to the value of the identity claim to form an identity name. The provider name is also used to build the redirect URL.
- 2
- Defines a list of attributes to use as the email address.
- 3
- Defines a list of attributes to use as a display name.
- 4
- Defines a list of attributes to use as a preferred user name.
- 5
- Defines the ID of a client registered with the OpenID provider. You must allow the client to redirect to thehttps://oauth-openshift.apps.<cluster_name>.<cluster_domain>/oauth2callback/<idp_provider_name>URL.
- 6
- Defines a secret of a client registered with the OpenID provider.
- 7
- The Issuer Identifier described in the OpenID spec. You must usehttpswithout query or fragment component.
- 8
- Defines a mapping method that controls how mappings are established between identities of this provider andUserobjects.
 
- Click Save.
Chapter 4. Handling a machine configuration for hosted control planes
			In a standalone OpenShift Container Platform cluster, a machine config pool manages a set of nodes. You can handle a machine configuration by using the MachineConfigPool custom resource (CR).
		
			You can reference any machineconfiguration.openshift.io resources in the nodepool.spec.config field of the NodePool CR.
		
			In hosted control planes, the MachineConfigPool CR does not exist. A node pool contains a set of compute nodes. You can handle a machine configuration by using node pools.
		
4.1. Configuring node pools for hosted control planes
				On hosted control planes, you can configure node pools by creating a MachineConfig object inside of a config map in the management cluster.
			
Procedure
- To create a - MachineConfigobject inside of a config map in the management cluster, enter the following information:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Sets the path on the node where theMachineConfigobject is stored.
 
- After you add the object to the config map, you can apply the config map to the node pool as follows: - $ oc edit nodepool <nodepool_name> --namespace <hosted_cluster_namespace> - $ oc edit nodepool <nodepool_name> --namespace <hosted_cluster_namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Replace<configmap_name>with the name of your config map.
 
4.2. Referencing the kubelet configuration in node pools
				To reference your kubelet configuration in node pools, you add the kubelet configuration in a config map and then apply the config map in the NodePool resource.
			
Procedure
- Add the kubelet configuration inside of a config map in the management cluster by entering the following information: - Example - ConfigMapobject with the kubelet configuration- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Apply the config map to the node pool by entering the following command: - $ oc edit nodepool <nodepool_name> --namespace clusters - $ oc edit nodepool <nodepool_name> --namespace clusters- 1 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Replace<nodepool_name>with the name of your node pool.
 - Example - NodePoolresource configuration- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Replace<configmap_name>with the name of your config map.
 
4.3. Configuring node tuning in a hosted cluster
				To set node-level tuning on the nodes in your hosted cluster, you can use the Node Tuning Operator. In hosted control planes, you can configure node tuning by creating config maps that contain Tuned objects and referencing those config maps in your node pools.
			
Procedure
- Create a config map that contains a valid tuned manifest, and reference the manifest in a node pool. In the following example, a - Tunedmanifest defines a profile that sets- vm.dirty_ratioto 55 on nodes that contain the- tuned-1-node-labelnode label with any value. Save the following- ConfigMapmanifest in a file named- tuned-1.yaml:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- If you do not add any labels to an entry in the - spec.recommendsection of the Tuned spec, node-pool-based matching is assumed, so the highest priority profile in the- spec.recommendsection is applied to nodes in the pool. Although you can achieve more fine-grained node-label-based matching by setting a label value in the Tuned- .spec.recommend.matchsection, node labels will not persist during an upgrade unless you set the- .spec.management.upgradeTypevalue of the node pool to- InPlace.
- Create the - ConfigMapobject in the management cluster:- oc --kubeconfig="$MGMT_KUBECONFIG" create -f tuned-1.yaml - $ oc --kubeconfig="$MGMT_KUBECONFIG" create -f tuned-1.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Reference the - ConfigMapobject in the- spec.tuningConfigfield of the node pool, either by editing a node pool or creating one. In this example, assume that you have only one- NodePool, named- nodepool-1, which contains 2 nodes.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- You can reference the same config map in multiple node pools. In hosted control planes, the Node Tuning Operator appends a hash of the node pool name and namespace to the name of the Tuned CRs to distinguish them. Outside of this case, do not create multiple TuneD profiles of the same name in different Tuned CRs for the same hosted cluster. 
Verification
					Now that you have created the ConfigMap object that contains a Tuned manifest and referenced it in a NodePool, the Node Tuning Operator syncs the Tuned objects into the hosted cluster. You can verify which Tuned objects are defined and which TuneD profiles are applied to each node.
				
- List the - Tunedobjects in the hosted cluster:- oc --kubeconfig="$HC_KUBECONFIG" get tuned.tuned.openshift.io -n openshift-cluster-node-tuning-operator - $ oc --kubeconfig="$HC_KUBECONFIG" get tuned.tuned.openshift.io -n openshift-cluster-node-tuning-operator- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME AGE default 7m36s rendered 7m36s tuned-1 65s - NAME AGE default 7m36s rendered 7m36s tuned-1 65s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- List the - Profileobjects in the hosted cluster:- oc --kubeconfig="$HC_KUBECONFIG" get profile.tuned.openshift.io -n openshift-cluster-node-tuning-operator - $ oc --kubeconfig="$HC_KUBECONFIG" get profile.tuned.openshift.io -n openshift-cluster-node-tuning-operator- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME TUNED APPLIED DEGRADED AGE nodepool-1-worker-1 tuned-1-profile True False 7m43s nodepool-1-worker-2 tuned-1-profile True False 7m14s - NAME TUNED APPLIED DEGRADED AGE nodepool-1-worker-1 tuned-1-profile True False 7m43s nodepool-1-worker-2 tuned-1-profile True False 7m14s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- If no custom profiles are created, the - openshift-nodeprofile is applied by default.
- To confirm that the tuning was applied correctly, start a debug shell on a node and check the sysctl values: - oc --kubeconfig="$HC_KUBECONFIG" debug node/nodepool-1-worker-1 -- chroot /host sysctl vm.dirty_ratio - $ oc --kubeconfig="$HC_KUBECONFIG" debug node/nodepool-1-worker-1 -- chroot /host sysctl vm.dirty_ratio- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - vm.dirty_ratio = 55 - vm.dirty_ratio = 55- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
4.4. Deploying the SR-IOV Operator for hosted control planes
Hosted control planes on the AWS platform is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
After you configure and deploy your hosting service cluster, you can create a subscription to the SR-IOV Operator on a hosted cluster. The SR-IOV pod runs on worker machines rather than the control plane.
Prerequisites
You must configure and deploy the hosted cluster on AWS. For more information, see Configuring the hosting cluster on AWS (Technology Preview).
Procedure
- Create a namespace and an Operator group: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create a subscription to the SR-IOV Operator: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
- To verify that the SR-IOV Operator is ready, run the following command and view the resulting output: - oc get csv -n openshift-sriov-network-operator - $ oc get csv -n openshift-sriov-network-operator- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.14.0-202211021237 SR-IOV Network Operator 4.14.0-202211021237 sriov-network-operator.4.14.0-202210290517 Succeeded - NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.14.0-202211021237 SR-IOV Network Operator 4.14.0-202211021237 sriov-network-operator.4.14.0-202210290517 Succeeded- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- To verify that the SR-IOV pods are deployed, run the following command: - oc get pods -n openshift-sriov-network-operator - $ oc get pods -n openshift-sriov-network-operator- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
4.5. Configuring the NTP server for hosted clusters
You can configure the Network Time Protocol (NTP) server for your hosted clusters by using Butane.
Procedure
- Create a Butane config file, - 99-worker-chrony.bu, that includes the contents of the- chrony.conffile. For more information about Butane, see "Creating machine configs with Butane".- Example - 99-worker-chrony.buconfiguration- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Specify an octal value mode for themodefield in the machine config file. After creating the file and applying the changes, themodefield is converted to a decimal value.
- 2
- Specify any valid, reachable time source, such as the one provided by your Dynamic Host Configuration Protocol (DHCP) server.
 Note- For machine-to-machine communication, the NTP on the User Datagram Protocol (UDP) port is - 123. If you configured an external NTP time server, you must open UDP port- 123.
- Use Butane to generate a - MachineConfigobject file,- 99-worker-chrony.yaml, that contains a configuration that Butane sends to the nodes. Run the following command:- butane 99-worker-chrony.bu -o 99-worker-chrony.yaml - $ butane 99-worker-chrony.bu -o 99-worker-chrony.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example - 99-worker-chrony.yamlconfiguration- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Add the contents of the - 99-worker-chrony.yamlfile inside of a config map in the management cluster:- Example config map - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Replace<namespace>with the name of your namespace where you created the node pool, such asclusters.
 
- Apply the config map to your node pool by running the following command: - oc edit nodepool <nodepool_name> --namespace <hosted_cluster_namespace> - $ oc edit nodepool <nodepool_name> --namespace <hosted_cluster_namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example - NodePoolconfiguration- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Replace<configmap_name>with the name of your config map.
 
- Add the list of your NTP servers in the - infra-env.yamlfile, which defines the- InfraEnvcustom resource (CR):- Example - infra-env.yamlfile- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Replace<ntp_server>with the name of your NTP server. For more details about creating a host inventory and theInfraEnvCR, see "Creating a host inventory".
 
- Apply the - InfraEnvCR by running the following command:- oc apply -f infra-env.yaml - $ oc apply -f infra-env.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
- Check the following fields to know the status of your host inventory: - 
								conditions: The standard Kubernetes conditions indicating if the image was created successfully.
- 
								isoDownloadURL: The URL to download the Discovery Image.
- createdTime: The time at which the image was last created. If you modify the- InfraEnvCR, ensure that you have updated the timestamp before downloading a new image.- Verify that your host inventory is created by running the following command: - oc describe infraenv <infraenv_resource_name> -n <infraenv_namespace> - $ oc describe infraenv <infraenv_resource_name> -n <infraenv_namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- If you modify the - InfraEnvCR, confirm that the- InfraEnvCR has created a new Discovery Image by looking at the- createdTimefield. If you already booted hosts, boot them again with the latest Discovery Image.
 
- 
								
Chapter 5. Using feature gates in a hosted cluster
			You can use feature gates in a hosted cluster to enable features that are not part of the default set of features. You can enable the TechPreviewNoUpgrade feature set by using feature gates in your hosted cluster.
		
5.1. Enabling feature sets by using feature gates
				You can enable the TechPreviewNoUpgrade feature set in a hosted cluster by editing the HostedCluster custom resource (CR) with the OpenShift CLI.
			
Prerequisites
- 
						You installed the OpenShift CLI (oc).
Procedure
- Open the - HostedClusterCR for editing on the hosting cluster by running the following command:- oc edit hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace> - $ oc edit hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Define the feature set by entering a value in the - featureSetfield. For example:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Warning- Enabling the - TechPreviewNoUpgradefeature set on your cluster cannot be undone and prevents minor version updates. This feature set allows you to enable these Technology Preview features on test clusters, where you can fully test them. Do not enable this feature set on production clusters.
- Save the file to apply the changes.
Verification
- Verify that the - TechPreviewNoUpgradefeature gate is enabled in your hosted cluster by running the following command:- oc get featuregate cluster -o yaml - $ oc get featuregate cluster -o yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Chapter 6. Updating hosted control planes
Updates for hosted control planes involve updating the hosted cluster and the node pools. For a cluster to remain fully operational during an update process, you must meet the requirements of the Kubernetes version skew policy while completing the control plane and node updates.
6.1. Requirements to upgrade hosted control planes
The multicluster engine for Kubernetes Operator can manage one or more OpenShift Container Platform clusters. After you create a hosted cluster on OpenShift Container Platform, you must import your hosted cluster in the multicluster engine Operator as a managed cluster. Then, you can use the OpenShift Container Platform cluster as a management cluster.
Consider the following requirements before you start updating hosted control planes:
- You must use the bare metal platform for an OpenShift Container Platform cluster when using OpenShift Virtualization as a provider.
- 
						You must use bare metal or OpenShift Virtualization as the cloud platform for the hosted cluster. You can find the platform type of your hosted cluster in the spec.Platform.typespecification of theHostedClustercustom resource (CR).
You must upgrade the OpenShift Container Platform cluster, multicluster engine Operator, hosted cluster, and node pools by completing the following tasks:
- Upgrade an OpenShift Container Platform cluster to the latest version. For more information, see "Updating a cluster using the web console" or "Updating a cluster using the CLI".
- Upgrade the multicluster engine Operator to the latest version. For more information, see "Updating installed Operators".
- Upgrade the hosted cluster and node pools from the previous OpenShift Container Platform version to the latest version. For more information, see "Updating a control plane in a hosted cluster" and "Updating node pools in a hosted cluster".
6.2. Setting channels in a hosted cluster
				You can see available updates in the HostedCluster.Status field of the HostedCluster custom resource (CR).
			
				The available updates are not fetched from the Cluster Version Operator (CVO) of a hosted cluster. The list of the available updates can be different from the available updates from the following fields of the HostedCluster custom resource (CR):
			
- 
						status.version.availableUpdates
- 
						status.version.conditionalUpdates
				The initial HostedCluster CR does not have any information in the status.version.availableUpdates and status.version.conditionalUpdates fields. After you set the spec.channel field to the stable OpenShift Container Platform release version, the HyperShift Operator reconciles the HostedCluster CR and updates the status.version field with the available and conditional updates.
			
				See the following example of the HostedCluster CR that contains the channel configuration:
			
- 1
- Replace<4.y>with the OpenShift Container Platform release version you specified inspec.release. For example, if you set thespec.releasetoocp-release:4.16.4-multi, you must setspec.channeltostable-4.16.
				After you configure the channel in the HostedCluster CR, to view the output of the status.version.availableUpdates and status.version.conditionalUpdates fields, run the following command:
			
oc get -n <hosted_cluster_namespace> hostedcluster <hosted_cluster_name> -o yaml
$ oc get -n <hosted_cluster_namespace> hostedcluster <hosted_cluster_name> -o yamlExample output
6.3. Updating the OpenShift Container Platform version in a hosted cluster
Hosted control planes enables the decoupling of updates between the control plane and the data plane.
As a cluster service provider or cluster administrator, you can manage the control plane and the data separately.
				You can update a control plane by modifying the HostedCluster custom resource (CR) and a node by modifying its NodePool CR. Both the HostedCluster and NodePool CRs specify an OpenShift Container Platform release image in a .release field.
			
To keep your hosted cluster fully operational during an update process, the control plane and the node updates must follow the Kubernetes version skew policy.
6.3.1. The multicluster engine Operator hub management cluster
The multicluster engine for Kubernetes Operator requires a specific OpenShift Container Platform version for the management cluster to remain in a supported state. You can install the multicluster engine Operator from OperatorHub in the OpenShift Container Platform web console.
See the following support matrices for the multicluster engine Operator versions:
The multicluster engine Operator supports the following OpenShift Container Platform versions:
- The latest unreleased version
- The latest released version
- Two versions before the latest released version
You can also get the multicluster engine Operator version as a part of Red Hat Advanced Cluster Management (RHACM).
6.3.2. Supported OpenShift Container Platform versions in a hosted cluster
When deploying a hosted cluster, the OpenShift Container Platform version of the management cluster does not affect the OpenShift Container Platform version of a hosted cluster.
					The HyperShift Operator creates the supported-versions ConfigMap in the hypershift namespace. The supported-versions ConfigMap describes the range of supported OpenShift Container Platform versions that you can deploy.
				
					See the following example of the supported-versions ConfigMap:
				
						To create a hosted cluster, you must use the OpenShift Container Platform version from the support version range. However, the multicluster engine Operator can manage only between n+1 and n-2 OpenShift Container Platform versions, where n defines the current minor version. You can check the multicluster engine Operator support matrix to ensure the hosted clusters managed by the multicluster engine Operator are within the supported OpenShift Container Platform range.
					
To deploy a higher version of a hosted cluster on OpenShift Container Platform, you must update the multicluster engine Operator to a new minor version release to deploy a new version of the Hypershift Operator. Upgrading the multicluster engine Operator to a new patch, or z-stream, release does not update the HyperShift Operator to the next version.
					See the following example output of the hcp version command that shows the supported OpenShift Container Platform versions for OpenShift Container Platform 4.16 in the management cluster:
				
Client Version: openshift/hypershift: fe67b47fb60e483fe60e4755a02b3be393256343. Latest supported OCP: 4.17.0 Server Version: 05864f61f24a8517731664f8091cedcfc5f9b60d Server Supports OCP Versions: 4.17, 4.16, 4.15, 4.14
Client Version: openshift/hypershift: fe67b47fb60e483fe60e4755a02b3be393256343. Latest supported OCP: 4.17.0
Server Version: 05864f61f24a8517731664f8091cedcfc5f9b60d
Server Supports OCP Versions: 4.17, 4.16, 4.15, 4.146.4. Updates for the hosted cluster
				The spec.release value dictates the version of the control plane. The HostedCluster object transmits the intended spec.release value to the HostedControlPlane.spec.release value and runs the appropriate Control Plane Operator version.
			
The hosted control plane manages the rollout of the new version of the control plane components along with any OpenShift Container Platform components through the new version of the Cluster Version Operator (CVO).
					In hosted control planes, the NodeHealthCheck resource cannot detect the status of the CVO. A cluster administrator must manually pause the remediation triggered by NodeHealthCheck, before performing critical operations, such as updating the cluster, to prevent new remediation actions from interfering with cluster updates.
				
					To pause the remediation, enter the array of strings, for example, pause-test-cluster, as a value of the pauseRequests field in the NodeHealthCheck resource. For more information, see About the Node Health Check Operator.
				
After the cluster update is complete, you can edit or delete the remediation. Navigate to the Compute → NodeHealthCheck page, click your node health check, and then click Actions, which shows a drop-down list.
6.5. Updates for node pools
				With node pools, you can configure the software that is running in the nodes by exposing the spec.release and spec.config values. You can start a rolling node pool update in the following ways:
			
- 
						Changing the spec.releaseorspec.configvalues.
- Changing any platform-specific field, such as the AWS instance type. The result is a set of new instances with the new type.
- Changing the cluster configuration, if the change propagates to the node.
				Node pools support replace updates and in-place updates. The nodepool.spec.release value dictates the version of any particular node pool. A NodePool object completes a replace or an in-place rolling update according to the .spec.management.upgradeType value.
			
After you create a node pool, you cannot change the update type. If you want to change the update type, you must create a node pool and delete the other one.
6.5.1. Replace updates for node pools
A replace update creates instances in the new version while it removes old instances from the previous version. This update type is effective in cloud environments where this level of immutability is cost effective.
Replace updates do not preserve any manual changes because the node is entirely re-provisioned.
6.5.2. In place updates for node pools
An in-place update directly updates the operating systems of the instances. This type is suitable for environments where the infrastructure constraints are higher, such as bare metal.
In-place updates can preserve manual changes, but will report errors if you make manual changes to any file system or operating system configuration that the cluster directly manages, such as kubelet certificates.
6.6. Updating node pools in a hosted cluster
You can update your version of OpenShift Container Platform by updating the node pools in your hosted cluster. The node pool version must not surpass the hosted control plane version.
				The .spec.release field in the NodePool custom resource (CR) shows the version of a node pool.
			
Procedure
- Change the - spec.release.imagevalue in the node pool by entering the following command:- oc patch nodepool <node_pool_name> -n <hosted_cluster_namespace> --type=merge -p '{"spec":{"nodeDrainTimeout":"60s","release":{"image":"<openshift_release_image>"}}}'- $ oc patch nodepool <node_pool_name> -n <hosted_cluster_namespace> --type=merge -p '{"spec":{"nodeDrainTimeout":"60s","release":{"image":"<openshift_release_image>"}}}'- 1 - 2 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Replace<node_pool_name>and<hosted_cluster_namespace>with your node pool name and hosted cluster namespace, respectively.
- 2
- The<openshift_release_image>variable specifies the new OpenShift Container Platform release image that you want to upgrade to, for example,quay.io/openshift-release-dev/ocp-release:4.y.z-x86_64. Replace<4.y.z>with the supported OpenShift Container Platform version.
 
Verification
- To verify that the new version was rolled out, check the - .status.conditionsvalue in the node pool by running the following command:- oc get -n <hosted_cluster_namespace> nodepool <node_pool_name> -o yaml - $ oc get -n <hosted_cluster_namespace> nodepool <node_pool_name> -o yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Replace<4.y.z>with the supported OpenShift Container Platform version.
 
6.7. Updating a control plane in a hosted cluster
				On hosted control planes, you can upgrade your version of OpenShift Container Platform by updating the hosted cluster. The .spec.release in the HostedCluster custom resource (CR) shows the version of the control plane. The HostedCluster updates the .spec.release field to the HostedControlPlane.spec.release and runs the appropriate Control Plane Operator version.
			
				The HostedControlPlane resource orchestrates the rollout of the new version of the control plane components along with the OpenShift Container Platform component in the data plane through the new version of the Cluster Version Operator (CVO). The HostedControlPlane includes the following artifacts:
			
- CVO
- Cluster Network Operator (CNO)
- Cluster Ingress Operator
- Manifests for the Kube API server, scheduler, and manager
- Machine approver
- Autoscaler
- Infrastructure resources to enable ingress for control plane endpoints such as the Kube API server, ignition, and konnectivity
				You can set the .spec.release field in the HostedCluster CR to update the control plane by using the information from the status.version.availableUpdates and status.version.conditionalUpdates fields.
			
Procedure
- Add the - hypershift.openshift.io/force-upgrade-to=<openshift_release_image>annotation to the hosted cluster by entering the following command:- oc annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> "hypershift.openshift.io/force-upgrade-to=<openshift_release_image>" --overwrite - $ oc annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> "hypershift.openshift.io/force-upgrade-to=<openshift_release_image>" --overwrite- 1 - 2 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Replace<hosted_cluster_name>and<hosted_cluster_namespace>with your hosted cluster name and hosted cluster namespace, respectively.
- 2
- The<openshift_release_image>variable specifies the new OpenShift Container Platform release image that you want to upgrade to, for example,quay.io/openshift-release-dev/ocp-release:4.y.z-x86_64. Replace<4.y.z>with the supported OpenShift Container Platform version.
 
- Change the - spec.release.imagevalue in the hosted cluster by entering the following command:- oc patch hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace> --type=merge -p '{"spec":{"release":{"image":"<openshift_release_image>"}}}'- $ oc patch hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace> --type=merge -p '{"spec":{"release":{"image":"<openshift_release_image>"}}}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
- To verify that the new version was rolled out, check the - .status.conditionsand- .status.versionvalues in the hosted cluster by running the following command:- oc get -n <hosted_cluster_namespace> hostedcluster <hosted_cluster_name> -o yaml - $ oc get -n <hosted_cluster_namespace> hostedcluster <hosted_cluster_name> -o yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
6.8. Updating a hosted cluster by using the multicluster engine Operator console
You can update your hosted cluster by using the multicluster engine Operator console.
Before updating a hosted cluster, you must refer to the available and conditional updates of a hosted cluster. Choosing a wrong release version might break the hosted cluster.
Procedure
- Select All clusters.
- Navigate to Infrastructure → Clusters to view managed hosted clusters.
- Click the Upgrade available link to update the control plane and node pools.
6.9. Limitations of managing imported hosted clusters
Hosted clusters are automatically imported into the local multicluster engine for Kubernetes Operator, unlike a standalone OpenShift Container Platform or third party clusters. Hosted clusters run some of their agents in the hosted mode so that the agents do not use the resources of your cluster.
				If you choose to automatically import hosted clusters, you can update node pools and the control plane in hosted clusters by using the HostedCluster resource on the management cluster. To update node pools and a control plane, see "Updating node pools in a hosted cluster" and "Updating a control plane in a hosted cluster".
			
You can import hosted clusters into a location other than the local multicluster engine Operator by using the Red Hat Advanced Cluster Management (RHACM). For more information, see "Discovering multicluster engine for Kubernetes Operator hosted clusters in Red Hat Advanced Cluster Management".
In this topology, you must update your hosted clusters by using the command-line interface or the console of the local multicluster engine for Kubernetes Operator where the cluster is hosted. You cannot update the hosted clusters through the RHACM hub cluster.
Chapter 7. Hosted control planes Observability
You can gather metrics for hosted control planes by configuring metrics sets. The HyperShift Operator can create or delete monitoring dashboards in the management cluster for each hosted cluster that it manages.
7.1. Configuring metrics sets for hosted control planes
				Hosted control planes for Red Hat OpenShift Container Platform creates ServiceMonitor resources in each control plane namespace that allow a Prometheus stack to gather metrics from the control planes. The ServiceMonitor resources use metrics relabelings to define which metrics are included or excluded from a particular component, such as etcd or the Kubernetes API server. The number of metrics that are produced by control planes directly impacts the resource requirements of the monitoring stack that gathers them.
			
Instead of producing a fixed number of metrics that apply to all situations, you can configure a metrics set that identifies a set of metrics to produce for each control plane. The following metrics sets are supported:
- 
						Telemetry: These metrics are needed for telemetry. This set is the default set and is the smallest set of metrics.
- 
						SRE: This set includes the necessary metrics to produce alerts and allow the troubleshooting of control plane components.
- 
						All: This set includes all of the metrics that are produced by standalone OpenShift Container Platform control plane components.
				To configure a metrics set, set the METRICS_SET environment variable in the HyperShift Operator deployment by entering the following command:
			
oc set env -n hypershift deployment/operator METRICS_SET=All
$ oc set env -n hypershift deployment/operator METRICS_SET=All7.1.1. Configuring the SRE metrics set
					When you specify the SRE metrics set, the HyperShift Operator looks for a config map named sre-metric-set with a single key: config. The value of the config key must contain a set of RelabelConfigs that are organized by control plane component.
				
You can specify the following components:
- 
							etcd
- 
							kubeAPIServer
- 
							kubeControllerManager
- 
							openshiftAPIServer
- 
							openshiftControllerManager
- 
							openshiftRouteControllerManager
- 
							cvo
- 
							olm
- 
							catalogOperator
- 
							registryOperator
- 
							nodeTuningOperator
- 
							controlPlaneOperator
- 
							hostedClusterConfigOperator
					A configuration of the SRE metrics set is illustrated in the following example:
				
7.2. Enabling monitoring dashboards in a hosted cluster
You can enable monitoring dashboards in a hosted cluster by creating a config map.
Procedure
- Create the - hypershift-operator-install-flagsconfig map in the- local-clusternamespace. See the following example configuration:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- The--monitoring-dashboards --metrics-set=Allflag adds the monitoring dashboard for all metrics.
 
- Wait a couple of minutes for the HyperShift Operator deployment in the - hypershiftnamespace to be updated to include the following environment variable:- - name: MONITORING_DASHBOARDS value: "1"- - name: MONITORING_DASHBOARDS value: "1"- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - When monitoring dashboards are enabled, for each hosted cluster that the HyperShift Operator manages, the Operator creates a config map named - cp-<hosted_cluster_namespace>-<hosted_cluster_name>in the- openshift-config-managednamespace, where- <hosted_cluster_namespace>is the namespace of the hosted cluster and- <hosted_cluster_name>is the name of the hosted cluster. As a result, a new dashboard is added in the administrative console of the management cluster.
- To view the dashboard, log in to the management cluster’s console and go to the dashboard for the hosted cluster by clicking Observe → Dashboards.
- 
						Optional: To disable monitoring dashboards in a hosted cluster, remove the --monitoring-dashboards --metrics-set=Allflag from thehypershift-operator-install-flagsconfig map. When you delete a hosted cluster, its corresponding dashboard is also deleted.
7.2.1. Dashboard customization
					To generate dashboards for each hosted cluster, the HyperShift Operator uses a template that is stored in the monitoring-dashboard-template config map in the Operator namespace (hypershift). This template contains a set of Grafana panels that contain the metrics for the dashboard. You can edit the content of the config map to customize the dashboards.
				
When a dashboard is generated, the following strings are replaced with values that correspond to a specific hosted cluster:
| Name | Description | 
|---|---|
| 
									 | The name of the hosted cluster | 
| 
									 | The namespace of the hosted cluster | 
| 
									 | The namespace where the control plane pods of the hosted cluster are placed | 
| 
									 | 
									The UUID of the hosted cluster, which matches the  | 
Chapter 8. High availability for hosted control planes
8.1. Recovering an unhealthy etcd cluster
In a highly available control plane, three etcd pods run as a part of a stateful set in an etcd cluster. To recover an etcd cluster, identify unhealthy etcd pods by checking the etcd cluster health.
8.1.1. Checking the status of an etcd cluster
You can check the status of the etcd cluster health by logging into any etcd pod.
Procedure
- Log in to an etcd pod by entering the following command: - oc rsh -n <hosted_control_plane_namespace> -c etcd <etcd_pod_name> - $ oc rsh -n <hosted_control_plane_namespace> -c etcd <etcd_pod_name>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Print the health status of an etcd cluster by entering the following command: - etcdctl endpoint health --cluster -w table - sh-4.4$ etcdctl endpoint health --cluster -w table- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - ENDPOINT HEALTH TOOK ERROR https://etcd-0.etcd-discovery.clusters-hosted.svc:2379 true 9.117698ms - ENDPOINT HEALTH TOOK ERROR https://etcd-0.etcd-discovery.clusters-hosted.svc:2379 true 9.117698ms- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
8.1.2. Recovering a failing etcd pod
Each etcd pod of a 3-node cluster has its own persistent volume claim (PVC) to store its data. An etcd pod might fail because of corrupted or missing data. You can recover a failing etcd pod and its PVC.
Procedure
- To confirm that the etcd pod is failing, enter the following command: - oc get pods -l app=etcd -n <hosted_control_plane_namespace> - $ oc get pods -l app=etcd -n <hosted_control_plane_namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME READY STATUS RESTARTS AGE etcd-0 2/2 Running 0 64m etcd-1 2/2 Running 0 45m etcd-2 1/2 CrashLoopBackOff 1 (5s ago) 64m - NAME READY STATUS RESTARTS AGE etcd-0 2/2 Running 0 64m etcd-1 2/2 Running 0 45m etcd-2 1/2 CrashLoopBackOff 1 (5s ago) 64m- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The failing etcd pod might have the - CrashLoopBackOffor- Errorstatus.
- Delete the failing pod and its PVC by entering the following command: - oc delete pvc/<etcd_pvc_name> pod/<etcd_pod_name> --wait=false - $ oc delete pvc/<etcd_pvc_name> pod/<etcd_pod_name> --wait=false- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
- Verify that a new etcd pod is up and running by entering the following command: - oc get pods -l app=etcd -n <hosted_control_plane_namespace> - $ oc get pods -l app=etcd -n <hosted_control_plane_namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME READY STATUS RESTARTS AGE etcd-0 2/2 Running 0 67m etcd-1 2/2 Running 0 48m etcd-2 2/2 Running 0 2m2s - NAME READY STATUS RESTARTS AGE etcd-0 2/2 Running 0 67m etcd-1 2/2 Running 0 48m etcd-2 2/2 Running 0 2m2s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
8.2. Backing up and restoring etcd in an on-premise environment
You can back up and restore etcd on a hosted cluster in an on-premise environment to fix failures.
8.2.1. Backing up and restoring etcd on a hosted cluster in an on-premise environment
					By backing up and restoring etcd on a hosted cluster, you can fix failures, such as corrupted or missing data in an etcd member of a three node cluster. If multiple members of the etcd cluster encounter data loss or have a CrashLoopBackOff status, this approach helps prevent an etcd quorum loss.
				
This procedure requires API downtime.
Prerequisites
- 
							The ocandjqbinaries have been installed.
Procedure
- First, set up your environment variables and scale down the API servers: - Set up environment variables for your hosted cluster by entering the following commands, replacing values as necessary: - CLUSTER_NAME=my-cluster - $ CLUSTER_NAME=my-cluster- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - HOSTED_CLUSTER_NAMESPACE=clusters - $ HOSTED_CLUSTER_NAMESPACE=clusters- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - CONTROL_PLANE_NAMESPACE="${HOSTED_CLUSTER_NAMESPACE}-${CLUSTER_NAME}"- $ CONTROL_PLANE_NAMESPACE="${HOSTED_CLUSTER_NAMESPACE}-${CLUSTER_NAME}"- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Pause reconciliation of the hosted cluster by entering the following command, replacing values as necessary: - oc patch -n ${HOSTED_CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":"true"}}' --type=merge- $ oc patch -n ${HOSTED_CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":"true"}}' --type=merge- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Scale down the API servers by entering the following commands: - Scale down the - kube-apiserver:- oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/kube-apiserver --replicas=0- $ oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/kube-apiserver --replicas=0- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Scale down the - openshift-apiserver:- oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-apiserver --replicas=0- $ oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-apiserver --replicas=0- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Scale down the - openshift-oauth-apiserver:- oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-oauth-apiserver --replicas=0- $ oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-oauth-apiserver --replicas=0- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
 
- Next, take a snapshot of etcd by using one of the following methods: - Use a previously backed-up snapshot of etcd.
- If you have an available etcd pod, take a snapshot from the active etcd pod by completing the following steps: - List etcd pods by entering the following command: - oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd- $ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Take a snapshot of the pod database and save it locally to your machine by entering the following commands: - ETCD_POD=etcd-0 - $ ETCD_POD=etcd-0- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Verify that the snapshot is successful by entering the following command: - oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/snapshot.db- $ oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/snapshot.db- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- Make a local copy of the snapshot by entering the following command: - oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/snapshot.db /tmp/etcd.snapshot.db- $ oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/snapshot.db /tmp/etcd.snapshot.db- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Make a copy of the snapshot database from etcd persistent storage: - List etcd pods by entering the following command: - oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd- $ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Find a pod that is running and set its name as the value of - ETCD_POD: ETCD_POD=etcd-0, and then copy its snapshot database by entering the following command:- oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/data/member/snap/db /tmp/etcd.snapshot.db- $ oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/data/member/snap/db /tmp/etcd.snapshot.db- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
 
 
- Next, scale down the etcd statefulset by entering the following command: - oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=0- $ oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=0- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Delete volumes for second and third members by entering the following command: - oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-1 pvc/data-etcd-2- $ oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-1 pvc/data-etcd-2- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create a pod to access the first etcd member’s data: - Get the etcd image by entering the following command: - ETCD_IMAGE=$(oc get -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd -o jsonpath='{ .spec.template.spec.containers[0].image }')- $ ETCD_IMAGE=$(oc get -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd -o jsonpath='{ .spec.template.spec.containers[0].image }')- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create a pod that allows access to etcd data: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Check the status of the - etcd-datapod and wait for it to be running by entering the following command:- oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd-data- $ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd-data- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Get the name of the - etcd-datapod by entering the following command:- DATA_POD=$(oc get -n ${CONTROL_PLANE_NAMESPACE} pods --no-headers -l app=etcd-data -o name | cut -d/ -f2)- $ DATA_POD=$(oc get -n ${CONTROL_PLANE_NAMESPACE} pods --no-headers -l app=etcd-data -o name | cut -d/ -f2)- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- Copy an etcd snapshot into the pod by entering the following command: - oc cp /tmp/etcd.snapshot.db ${CONTROL_PLANE_NAMESPACE}/${DATA_POD}:/var/lib/restored.snap.db- $ oc cp /tmp/etcd.snapshot.db ${CONTROL_PLANE_NAMESPACE}/${DATA_POD}:/var/lib/restored.snap.db- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Remove old data from the - etcd-datapod by entering the following commands:- oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm -rf /var/lib/data- $ oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm -rf /var/lib/data- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- mkdir -p /var/lib/data- $ oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- mkdir -p /var/lib/data- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Restore the etcd snapshot by entering the following command: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Remove the temporary etcd snapshot from the pod by entering the following command: - oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm /var/lib/restored.snap.db- $ oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm /var/lib/restored.snap.db- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Delete data access deployment by entering the following command: - oc delete -n ${CONTROL_PLANE_NAMESPACE} deployment/etcd-data- $ oc delete -n ${CONTROL_PLANE_NAMESPACE} deployment/etcd-data- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Scale up the etcd cluster by entering the following command: - oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=3- $ oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=3- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Wait for the etcd member pods to return and report as available by entering the following command: - oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd -w- $ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd -w- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Scale up all etcd-writer deployments by entering the following command: - oc scale deployment -n ${CONTROL_PLANE_NAMESPACE} --replicas=3 kube-apiserver openshift-apiserver openshift-oauth-apiserver- $ oc scale deployment -n ${CONTROL_PLANE_NAMESPACE} --replicas=3 kube-apiserver openshift-apiserver openshift-oauth-apiserver- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- Restore reconciliation of the hosted cluster by entering the following command: - oc patch -n ${CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":""}}' --type=merge- $ oc patch -n ${CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":""}}' --type=merge- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
8.3. Backing up and restoring etcd on AWS
You can back up and restore etcd on a hosted cluster on Amazon Web Services (AWS) to fix failures.
Hosted control planes on the AWS platform is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
8.3.1. Taking a snapshot of etcd for a hosted cluster
To back up etcd for a hosted cluster, you must take a snapshot of etcd. Later, you can restore etcd by using the snapshot.
This procedure requires API downtime.
Procedure
- Pause reconciliation of the hosted cluster by entering the following command: - oc patch -n clusters hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"true"}}' --type=merge- $ oc patch -n clusters hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"true"}}' --type=merge- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Stop all etcd-writer deployments by entering the following command: - oc scale deployment -n <hosted_cluster_namespace> --replicas=0 kube-apiserver openshift-apiserver openshift-oauth-apiserver - $ oc scale deployment -n <hosted_cluster_namespace> --replicas=0 kube-apiserver openshift-apiserver openshift-oauth-apiserver- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- To take an etcd snapshot, use the - execcommand in each etcd container by entering the following command:- oc exec -it <etcd_pod_name> -n <hosted_cluster_namespace> -- env ETCDCTL_API=3 /usr/bin/etcdctl --cacert /etc/etcd/tls/etcd-ca/ca.crt --cert /etc/etcd/tls/client/etcd-client.crt --key /etc/etcd/tls/client/etcd-client.key --endpoints=localhost:2379 snapshot save /var/lib/data/snapshot.db - $ oc exec -it <etcd_pod_name> -n <hosted_cluster_namespace> -- env ETCDCTL_API=3 /usr/bin/etcdctl --cacert /etc/etcd/tls/etcd-ca/ca.crt --cert /etc/etcd/tls/client/etcd-client.crt --key /etc/etcd/tls/client/etcd-client.key --endpoints=localhost:2379 snapshot save /var/lib/data/snapshot.db- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- To check the snapshot status, use the - execcommand in each etcd container by running the following command:- oc exec -it <etcd_pod_name> -n <hosted_cluster_namespace> -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/data/snapshot.db - $ oc exec -it <etcd_pod_name> -n <hosted_cluster_namespace> -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/data/snapshot.db- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Copy the snapshot data to a location where you can retrieve it later, such as an S3 bucket. See the following example. Note- The following example uses signature version 2. If you are in a region that supports signature version 4, such as the - us-east-2region, use signature version 4. Otherwise, when copying the snapshot to an S3 bucket, the upload fails.- Example - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- To restore the snapshot on a new cluster later, save the encryption secret that the hosted cluster references. - Get the secret encryption key by entering the following command: - oc get hostedcluster <hosted_cluster_name> -o=jsonpath='{.spec.secretEncryption.aescbc}'- $ oc get hostedcluster <hosted_cluster_name> -o=jsonpath='{.spec.secretEncryption.aescbc}' {"activeKey":{"name":"<hosted_cluster_name>-etcd-encryption-key"}}- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Save the secret encryption key by entering the following command: - oc get secret <hosted_cluster_name>-etcd-encryption-key -o=jsonpath='{.data.key}'- $ oc get secret <hosted_cluster_name>-etcd-encryption-key -o=jsonpath='{.data.key}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - You can decrypt this key when restoring a snapshot on a new cluster. 
 
Next steps
Restore the etcd snapshot.
8.3.2. Restoring an etcd snapshot on a hosted cluster
If you have a snapshot of etcd from your hosted cluster, you can restore it. Currently, you can restore an etcd snapshot only during cluster creation.
					To restore an etcd snapshot, you modify the output from the create cluster --render command and define a restoreSnapshotURL value in the etcd section of the HostedCluster specification.
				
						The --render flag in the hcp create command does not render the secrets. To render the secrets, you must use both the --render and the --render-sensitive flags in the hcp create command.
					
Prerequisites
You took an etcd snapshot on a hosted cluster.
Procedure
- On the - awscommand-line interface (CLI), create a pre-signed URL so that you can download your etcd snapshot from S3 without passing credentials to the etcd deployment:- ETCD_SNAPSHOT=${ETCD_SNAPSHOT:-"s3://${BUCKET_NAME}/${CLUSTER_NAME}-snapshot.db"} ETCD_SNAPSHOT_URL=$(aws s3 presign ${ETCD_SNAPSHOT})- ETCD_SNAPSHOT=${ETCD_SNAPSHOT:-"s3://${BUCKET_NAME}/${CLUSTER_NAME}-snapshot.db"} ETCD_SNAPSHOT_URL=$(aws s3 presign ${ETCD_SNAPSHOT})- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Modify the - HostedClusterspecification to refer to the URL:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- 
							Ensure that the secret that you referenced from the spec.secretEncryption.aescbcvalue contains the same AES key that you saved in the previous steps.
8.4. Disaster recovery for a hosted cluster in AWS
You can recover a hosted cluster to the same region within Amazon Web Services (AWS). For example, you need disaster recovery when the upgrade of a management cluster fails and the hosted cluster is in a read-only state.
Hosted control planes is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The disaster recovery process involves the following steps:
- Backing up the hosted cluster on the source management cluster
- Restoring the hosted cluster on a destination management cluster
- Deleting the hosted cluster from the source management cluster
Your workloads remain running during the process. The Cluster API might be unavailable for a period, but that does not affect the services that are running on the worker nodes.
					Both the source management cluster and the destination management cluster must have the --external-dns flags to maintain the API server URL. That way, the server URL ends with https://api-sample-hosted.sample-hosted.aws.openshift.com. See the following example:
				
Example: External DNS flags
--external-dns-provider=aws \ --external-dns-credentials=<path_to_aws_credentials_file> \ --external-dns-domain-filter=<basedomain>
--external-dns-provider=aws \
--external-dns-credentials=<path_to_aws_credentials_file> \
--external-dns-domain-filter=<basedomain>
					If you do not include the --external-dns flags to maintain the API server URL, you cannot migrate the hosted cluster.
				
8.4.1. Overview of the backup and restore process
The backup and restore process works as follows:
- On management cluster 1, which you can think of as the source management cluster, the control plane and workers interact by using the external DNS API. The external DNS API is accessible, and a load balancer sits between the management clusters. 
- You take a snapshot of the hosted cluster, which includes etcd, the control plane, and the worker nodes. During this process, the worker nodes continue to try to access the external DNS API even if it is not accessible, the workloads are running, the control plane is saved in a local manifest file, and etcd is backed up to an S3 bucket. The data plane is active and the control plane is paused. 
- On management cluster 2, which you can think of as the destination management cluster, you restore etcd from the S3 bucket and restore the control plane from the local manifest file. During this process, the external DNS API is stopped, the hosted cluster API becomes inaccessible, and any workers that use the API are unable to update their manifest files, but the workloads are still running. 
- The external DNS API is accessible again, and the worker nodes use it to move to management cluster 2. The external DNS API can access the load balancer that points to the control plane. 
- On management cluster 2, the control plane and worker nodes interact by using the external DNS API. The resources are deleted from management cluster 1, except for the S3 backup of etcd. If you try to set up the hosted cluster again on mangagement cluster 1, it will not work. 
8.4.2. Backing up a hosted cluster
To recover your hosted cluster in your target management cluster, you first need to back up all of the relevant data.
Procedure
- Create a configmap file to declare the source management cluster by entering this command: - oc create configmap mgmt-parent-cluster -n default --from-literal=from=${MGMT_CLUSTER_NAME}- $ oc create configmap mgmt-parent-cluster -n default --from-literal=from=${MGMT_CLUSTER_NAME}- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Shut down the reconciliation in the hosted cluster and in the node pools by entering these commands: - PAUSED_UNTIL="true" oc patch -n ${HC_CLUSTER_NS} hostedclusters/${HC_CLUSTER_NAME} -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge oc scale deployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 kube-apiserver openshift-apiserver openshift-oauth-apiserver control-plane-operator- $ PAUSED_UNTIL="true" $ oc patch -n ${HC_CLUSTER_NS} hostedclusters/${HC_CLUSTER_NAME} -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge $ oc scale deployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 kube-apiserver openshift-apiserver openshift-oauth-apiserver control-plane-operator- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - PAUSED_UNTIL="true" oc patch -n ${HC_CLUSTER_NS} hostedclusters/${HC_CLUSTER_NAME} -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge oc patch -n ${HC_CLUSTER_NS} nodepools/${NODEPOOLS} -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge oc scale deployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 kube-apiserver openshift-apiserver openshift-oauth-apiserver control-plane-operator- $ PAUSED_UNTIL="true" $ oc patch -n ${HC_CLUSTER_NS} hostedclusters/${HC_CLUSTER_NAME} -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge $ oc patch -n ${HC_CLUSTER_NS} nodepools/${NODEPOOLS} -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge $ oc scale deployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 kube-apiserver openshift-apiserver openshift-oauth-apiserver control-plane-operator- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Back up etcd and upload the data to an S3 bucket by running this bash script: Tip- Wrap this script in a function and call it from the main function. - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - For more information about backing up etcd, see "Backing up and restoring etcd on a hosted cluster". 
- Back up Kubernetes and OpenShift Container Platform objects by entering the following commands. You need to back up the following objects: - 
									HostedClusterandNodePoolobjects from the HostedCluster namespace
- 
									HostedClustersecrets from the HostedCluster namespace
- 
									HostedControlPlanefrom the Hosted Control Plane namespace
- 
									Clusterfrom the Hosted Control Plane namespace
- 
									AWSCluster,AWSMachineTemplate, andAWSMachinefrom the Hosted Control Plane namespace
- 
									MachineDeployments,MachineSets, andMachinesfrom the Hosted Control Plane namespace
- ControlPlanesecrets from the Hosted Control Plane namespace- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- 
									
- Clean up the - ControlPlaneroutes by entering this command:- oc delete routes -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all- $ oc delete routes -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - By entering that command, you enable the ExternalDNS Operator to delete the Route53 entries. 
- Verify that the Route53 entries are clean by running this script: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
Check all of the OpenShift Container Platform objects and the S3 bucket to verify that everything looks as expected.
Next steps
Restore your hosted cluster.
8.4.3. Restoring a hosted cluster
Gather all of the objects that you backed up and restore them in your destination management cluster.
Prerequisites
You backed up the data from your source management cluster.
					Ensure that the kubeconfig file of the destination management cluster is placed as it is set in the KUBECONFIG variable or, if you use the script, in the MGMT2_KUBECONFIG variable. Use export KUBECONFIG=<Kubeconfig FilePath> or, if you use the script, use export KUBECONFIG=${MGMT2_KUBECONFIG}.
				
Procedure
- Verify that the new management cluster does not contain any namespaces from the cluster that you are restoring by entering these commands: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Re-create the deleted namespaces by entering these commands: - Namespace creation oc new-project ${HC_CLUSTER_NS} oc new-project ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}- # Namespace creation $ oc new-project ${HC_CLUSTER_NS} $ oc new-project ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Restore the secrets in the HC namespace by entering this command: - oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/secret-*- $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/secret-*- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Restore the objects in the - HostedClustercontrol plane namespace by entering these commands:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- If you are recovering the nodes and the node pool to reuse AWS instances, restore the objects in the HC control plane namespace by entering these commands: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Restore the etcd data and the hosted cluster by running this bash script: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- If you are recovering the nodes and the node pool to reuse AWS instances, restore the node pool by entering this command: - oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/np-*- $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/np-*- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
- To verify that the nodes are fully restored, use this function: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Next steps
Shut down and delete your cluster.
8.4.4. Deleting a hosted cluster from your source management cluster
After you back up your hosted cluster and restore it to your destination management cluster, you shut down and delete the hosted cluster on your source management cluster.
Prerequisites
You backed up your data and restored it to your source management cluster.
					Ensure that the kubeconfig file of the destination management cluster is placed as it is set in the KUBECONFIG variable or, if you use the script, in the MGMT_KUBECONFIG variable. Use export KUBECONFIG=<Kubeconfig FilePath> or, if you use the script, use export KUBECONFIG=${MGMT_KUBECONFIG}.
				
Procedure
- Scale the - deploymentand- statefulsetobjects by entering these commands:Important- Do not scale the stateful set if the value of its - spec.persistentVolumeClaimRetentionPolicy.whenScaledfield is set to- Delete, because this could lead to a loss of data.- As a workaround, update the value of the - spec.persistentVolumeClaimRetentionPolicy.whenScaledfield to- Retain. Ensure that no controllers exist that reconcile the stateful set and would return the value back to- Delete, which could lead to a loss of data.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Delete the - NodePoolobjects by entering these commands:- NODEPOOLS=$(oc get nodepools -n ${HC_CLUSTER_NS} -o=jsonpath='{.items[?(@.spec.clusterName=="'${HC_CLUSTER_NAME}'")].metadata.name}') if [[ ! -z "${NODEPOOLS}" ]];then oc patch -n "${HC_CLUSTER_NS}" nodepool ${NODEPOOLS} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' oc delete np -n ${HC_CLUSTER_NS} ${NODEPOOLS} fi- NODEPOOLS=$(oc get nodepools -n ${HC_CLUSTER_NS} -o=jsonpath='{.items[?(@.spec.clusterName=="'${HC_CLUSTER_NAME}'")].metadata.name}') if [[ ! -z "${NODEPOOLS}" ]];then oc patch -n "${HC_CLUSTER_NS}" nodepool ${NODEPOOLS} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' oc delete np -n ${HC_CLUSTER_NS} ${NODEPOOLS} fi- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Delete the - machineand- machinesetobjects by entering these commands:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Delete the cluster object by entering these commands: - Cluster C_NAME=$(oc get cluster -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name) oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${C_NAME} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' oc delete cluster.cluster.x-k8s.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all- # Cluster $ C_NAME=$(oc get cluster -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name) $ oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${C_NAME} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' $ oc delete cluster.cluster.x-k8s.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Delete the AWS machines (Kubernetes objects) by entering these commands. Do not worry about deleting the real AWS machines. The cloud instances will not be affected. - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Delete the - HostedControlPlaneand- ControlPlaneHC namespace objects by entering these commands:- Delete HCP and ControlPlane HC NS oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} hostedcontrolplane.hypershift.openshift.io ${HC_CLUSTER_NAME} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' oc delete hostedcontrolplane.hypershift.openshift.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all oc delete ns ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} || true- # Delete HCP and ControlPlane HC NS $ oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} hostedcontrolplane.hypershift.openshift.io ${HC_CLUSTER_NAME} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' $ oc delete hostedcontrolplane.hypershift.openshift.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all $ oc delete ns ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} || true- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Delete the - HostedClusterand HC namespace objects by entering these commands:- Delete HC and HC Namespace oc -n ${HC_CLUSTER_NS} patch hostedclusters ${HC_CLUSTER_NAME} -p '{"metadata":{"finalizers":null}}' --type merge || true oc delete hc -n ${HC_CLUSTER_NS} ${HC_CLUSTER_NAME} || true oc delete ns ${HC_CLUSTER_NS} || true- # Delete HC and HC Namespace $ oc -n ${HC_CLUSTER_NS} patch hostedclusters ${HC_CLUSTER_NAME} -p '{"metadata":{"finalizers":null}}' --type merge || true $ oc delete hc -n ${HC_CLUSTER_NS} ${HC_CLUSTER_NAME} || true $ oc delete ns ${HC_CLUSTER_NS} || true- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
- To verify that everything works, enter these commands: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Next steps
Delete the OVN pods in the hosted cluster so that you can connect to the new OVN control plane that runs in the new management cluster:
- 
							Load the KUBECONFIGenvironment variable with the hosted cluster’s kubeconfig path.
- Enter this command: - oc delete pod -n openshift-ovn-kubernetes --all - $ oc delete pod -n openshift-ovn-kubernetes --all- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Chapter 9. Troubleshooting hosted control planes
If you encounter issues with hosted control planes, see the following information to guide you through troubleshooting.
9.1. Gathering information to troubleshoot hosted control planes
				When you need to troubleshoot an issue with hosted control plane clusters, you can gather information by running the must-gather command. The command generates output for the management cluster and the hosted cluster.
			
The output for the management cluster contains the following content:
- Cluster-scoped resources: These resources are node definitions of the management cluster.
- 
						The hypershift-dumpcompressed file: This file is useful if you need to share the content with other people.
- Namespaced resources: These resources include all of the objects from the relevant namespaces, such as config maps, services, events, and logs.
- Network logs: These logs include the OVN northbound and southbound databases and the status for each one.
- Hosted clusters: This level of output involves all of the resources inside of the hosted cluster.
The output for the hosted cluster contains the following content:
- Cluster-scoped resources: These resources include all of the cluster-wide objects, such as nodes and CRDs.
- Namespaced resources: These resources include all of the objects from the relevant namespaces, such as config maps, services, events, and logs.
Although the output does not contain any secret objects from the cluster, it can contain references to the names of secrets.
Prerequisites
- 
						You must have cluster-adminaccess to the management cluster.
- 
						You need the namevalue for theHostedClusterresource and the namespace where the CR is deployed.
- 
						You must have the hcpcommand line interface installed. For more information, see Installing the hosted control planes command line interface.
- 
						You must have the OpenShift CLI (oc) installed.
- 
						You must ensure that the kubeconfigfile is loaded and is pointing to the management cluster.
Procedure
- To gather the output for troubleshooting, enter the following command: - oc adm must-gather --image=registry.redhat.io/multicluster-engine/must-gather-rhel9:v<mce_version> \ /usr/bin/gather hosted-cluster-namespace=HOSTEDCLUSTERNAMESPACE hosted-cluster-name=HOSTEDCLUSTERNAME \ --dest-dir=NAME ; tar -cvzf NAME.tgz NAME - $ oc adm must-gather --image=registry.redhat.io/multicluster-engine/must-gather-rhel9:v<mce_version> \ /usr/bin/gather hosted-cluster-namespace=HOSTEDCLUSTERNAMESPACE hosted-cluster-name=HOSTEDCLUSTERNAME \ --dest-dir=NAME ; tar -cvzf NAME.tgz NAME- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - where: - 
								You replace <mce_version>with the version of multicluster engine Operator that you are using; for example,2.4.
- 
								The hosted-cluster-namespace=HOSTEDCLUSTERNAMESPACEparameter is optional. If you do not include it, the command runs as though the hosted cluster is in the default namespace, which isclusters.
- 
								The --dest-dir=NAMEparameter is optional. Specify that parameter if you want to save the results of the command to a compressed file, replacingNAMEwith the name of the directory where you want to save the results.
 
- 
								You replace 
9.2. Pausing the reconciliation of a hosted cluster and hosted control plane
If you are a cluster instance administrator, you can pause the reconciliation of a hosted cluster and hosted control plane. You might want to pause reconciliation when you back up and restore an etcd database or when you need to debug problems with a hosted cluster or hosted control plane.
Procedure
- To pause reconciliation for a hosted cluster and hosted control plane, populate the - pausedUntilfield of the- HostedClusterresource.- To pause the reconciliation until a specific time, enter the following command: - oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"<timestamp>"}}' --type=merge- $ oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"<timestamp>"}}' --type=merge- 1 - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Specify a timestamp in the RFC339 format, for example,2024-03-03T03:28:48Z. The reconciliation is paused until the specified time is passed.
 
- To pause the reconciliation indefinitely, enter the following command: - oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"true"}}' --type=merge- $ oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":"true"}}' --type=merge- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The reconciliation is paused until you remove the field from the - HostedClusterresource.- When the pause reconciliation field is populated for the - HostedClusterresource, the field is automatically added to the associated- HostedControlPlaneresource.
 
- To remove the - pausedUntilfield, enter the following patch command:- oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":null}}' --type=merge- $ oc patch -n <hosted_cluster_namespace> hostedclusters/<hosted_cluster_name> -p '{"spec":{"pausedUntil":null}}' --type=merge- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
9.3. Scaling down the data plane to zero
If you are not using the hosted control plane, to save the resources and cost you can scale down a data plane to zero.
Ensure you are prepared to scale down the data plane to zero. Because the workload from the worker nodes disappears after scaling down.
Procedure
- Set the - kubeconfigfile to access the hosted cluster by running the following command:- export KUBECONFIG=<install_directory>/auth/kubeconfig - $ export KUBECONFIG=<install_directory>/auth/kubeconfig- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Get the name of the - NodePoolresource associated to your hosted cluster by running the following command:- oc get nodepool --namespace <HOSTED_CLUSTER_NAMESPACE> - $ oc get nodepool --namespace <HOSTED_CLUSTER_NAMESPACE>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Optional: To prevent the pods from draining, add the - nodeDrainTimeoutfield in the- NodePoolresource by running the following command:- oc edit nodepool <nodepool_name> --namespace <hosted_cluster_namespace> - $ oc edit nodepool <nodepool_name> --namespace <hosted_cluster_namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- To allow the node draining process to continue for a certain period of time, you can set the value of the - nodeDrainTimeoutfield accordingly, for example,- nodeDrainTimeout: 1m.
- Scale down the - NodePoolresource associated to your hosted cluster by running the following command:- oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=0 - $ oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=0- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- After scaling down the data plan to zero, some pods in the control plane stay in the - Pendingstatus and the hosted control plane stays up and running. If necessary, you can scale up the- NodePoolresource.
- Optional: Scale up the - NodePoolresource associated to your hosted cluster by running the following command:- oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=1 - $ oc scale nodepool/<NODEPOOL_NAME> --namespace <HOSTED_CLUSTER_NAMESPACE> --replicas=1- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - After rescaling the - NodePoolresource, wait for couple of minutes for the- NodePoolresource to become available in a- Readystate.
Verification
- Verify that the value for the - nodeDrainTimeoutfield is greater than- 0sby running the following command:- oc get nodepool -n <hosted_cluster_namespace> <nodepool_name> -ojsonpath='{.spec.nodeDrainTimeout}'- $ oc get nodepool -n <hosted_cluster_namespace> <nodepool_name> -ojsonpath='{.spec.nodeDrainTimeout}'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
        Legal Notice
        
          
            
          
        
      
 
Copyright © 2025 Red Hat
OpenShift documentation is licensed under the Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
Modified versions must remove all Red Hat trademarks.
Portions adapted from https://github.com/kubernetes-incubator/service-catalog/ with modifications by Red Hat.
Red Hat, Red Hat Enterprise Linux, the Red Hat logo, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat Software Collections is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.
 
     
     
     
     
     
     
    