Search

Chapter 2. Preparing to deploy hosted control planes

download PDF

2.1. Requirements for hosted control planes

The hosting cluster and workers must run on the same infrastructure. For example, you cannot run your hosting cluster on bare metal and your workers on the cloud. However, the hub cluster and workers do not need to run on the same platform. For example, you might run your hosting cluster on bare metal and workers on OpenShift Virtualization.

The control plane is associated with a hosted cluster and runs as pods in a single namespace. When the cluster service consumer creates a hosted cluster, it creates a worker node that is independent of the control plane.

2.1.1. Platform and version requirements for Hosted control planes

The following table indicates which OpenShift Container Platform versions are supported for each platform. In the table, Hosting OpenShift Container Platform version refers to the OpenShift Container Platform version where the multicluster engine Operator is enabled:

Table 2.1. Required OpenShift Container Platform versions for platforms
PlatformHosting OpenShift Container Platform versionHosted OpenShift Container Platform version

Amazon Web Services

4.11 - 4.16

4.14 - 4.16 (only)

IBM Power

4.16

4.16 (only)

IBM Z

4.16

4.16 (only)

OpenShift Virtualization

4.14 - 4.16

4.14 - 4.16 (only)

Bare metal

4.14 - 4.16

4.14 - 4.16 (only)

Non-bare metal agent machines

4.16

4.16 (only)

2.2. Sizing guidance for hosted control planes

Many factors, including hosted cluster workload and worker node count, affect how many hosted clusters can fit within a certain number of control-plane nodes. Use this sizing guide to help with hosted cluster capacity planning. This guidance assumes a highly available hosted control planes topology. The load-based sizing examples were measured on a bare-metal cluster. Cloud-based instances might have different limiting factors, such as memory size.

You can override the following resource utilization sizing measurements and disable the metric service monitoring.

See the following highly available hosted control planes requirements, which were tested with OpenShift Container Platform version 4.12.9 and later:

  • 78 pods
  • Three 8 GiB PVs for etcd
  • Minimum vCPU: approximately 5.5 cores
  • Minimum memory: approximately 19 GiB

Additional resources

2.2.1. Pod limits

The maxPods setting for each node affects how many hosted clusters can fit in a control-plane node. It is important to note the maxPods value on all control-plane nodes. Plan for about 75 pods for each highly available hosted control plane.

For bare-metal nodes, the default maxPods setting of 250 is likely to be a limiting factor because roughly three hosted control planes fit for each node given the pod requirements, even if the machine has plenty of resources to spare. Setting the maxPods value to 500 by configuring the KubeletConfig value allows for greater hosted control plane density, which can help you take advantage of additional compute resources.

Additional resources

2.2.2. Request-based resource limit

The maximum number of hosted control planes that the cluster can host is calculated based on the hosted control plane CPU and memory requests from the pods.

A highly available hosted control plane consists of 78 pods that request 5 vCPUs and 18 GB memory. These baseline numbers are compared to the cluster worker node resource capacities to estimate the maximum number of hosted control planes.

2.2.3. Load-based limit

The maximum number of hosted control planes that the cluster can host is calculated based on the hosted control plane pods CPU and memory utilizations when some workload is put on the hosted control plane Kubernetes API server.

The following method is used to measure the hosted control plane resource utilizations as the workload increases:

  • A hosted cluster with 9 workers that use 8 vCPU and 32 GiB each, while using the KubeVirt platform
  • The workload test profile that is configured to focus on API control-plane stress, based on the following definition:

    • Created objects for each namespace, scaling up to 100 namespaces total
    • Additional API stress with continuous object deletion and creation
    • Workload queries-per-second (QPS) and Burst settings set high to remove any client-side throttling

As the load increases by 1000 QPS, the hosted control plane resource utilization increases by 9 vCPUs and 2.5 GB memory.

For general sizing purposes, consider the 1000 QPS API rate that is a medium hosted cluster load, and a 2000 QPS API that is a heavy hosted cluster load.

Note

This test provides an estimation factor to increase the compute resource utilization based on the expected API load. Exact utilization rates can vary based on the type and pace of the cluster workload.

Table 2.2. Load table
Hosted control plane resource utilization scalingvCPUsMemory (GiB)

Resource utilization with no load

2.9

11.1

Resource utilization with 1000 QPS

9.0

2.5

As the load increases by 1000 QPS, the hosted control plane resource utilization increases by 9 vCPUs and 2.5 GB memory.

For general sizing purposes, consider a 1000 QPS API rate to be a medium hosted cluster load and a 2000 QPS API to be a heavy hosted cluster load.

The following example shows hosted control plane resource scaling for the workload and API rate definitions:

Table 2.3. API rate table
QPS (API rate)vCPU usageMemory usage (GiB)

Low load (Less than 50 QPS)

2.9

11.1

Medium load (1000 QPS)

11.9

13.6

High load (2000 QPS)

20.9

16.1

The hosted control plane sizing is about control-plane load and workloads that cause heavy API activity, etcd activity, or both. Hosted pod workloads that focus on data-plane loads, such as running a database, might not result in high API rates.

2.2.4. Sizing calculation example

This example provides sizing guidance for the following scenario:

  • Three bare-metal workers that are labeled as hypershift.openshift.io/control-plane nodes
  • maxPods value set to 500
  • The expected API rate is medium or about 1000, according to the load-based limits
Table 2.4. Limit inputs
Limit descriptionServer 1Server 2

Number of vCPUs on worker node

64

128

Memory on worker node (GiB)

128

256

Maximum pods per worker

500

500

Number of workers used to host control planes

3

3

Maximum QPS target rate (API requests per second)

1000

1000

Table 2.5. Sizing calculation example

Calculated values based on worker node size and API rate

Server 1

Server 2

Calculation notes

Maximum hosted control planes per worker based on vCPU requests

12.8

25.6

Number of worker vCPUs ÷ 5 total vCPU requests per hosted control plane

Maximum hosted control planes per worker based on vCPU usage

5.4

10.7

Number of vCPUS ÷ (2.9 measured idle vCPU usage + (QPS target rate ÷ 1000) × 9.0 measured vCPU usage per 1000 QPS increase)

Maximum hosted control planes per worker based on memory requests

7.1

14.2

Worker memory GiB ÷ 18 GiB total memory request per hosted control plane

Maximum hosted control planes per worker based on memory usage

9.4

18.8

Worker memory GiB ÷ (11.1 measured idle memory usage + (QPS target rate ÷ 1000) × 2.5 measured memory usage per 1000 QPS increase)

Maximum hosted control planes per worker based on per node pod limit

6.7

6.7

500 maxPods ÷ 75 pods per hosted control plane

Minimum of previously mentioned maximums

5.4

6.7

 
 

vCPU limiting factor

maxPods limiting factor

 

Maximum number of hosted control planes within a management cluster

16

20

Minimum of previously mentioned maximums × 3 control-plane workers

Table 2.6. Hosted control planes capacity metrics

Name

Description

mce_hs_addon_request_based_hcp_capacity_gauge

Estimated maximum number of hosted control planes the cluster can host based on a highly available hosted control planes resource request.

mce_hs_addon_low_qps_based_hcp_capacity_gauge

Estimated maximum number of hosted control planes the cluster can host if all hosted control planes make around 50 QPS to the clusters Kube API server.

mce_hs_addon_medium_qps_based_hcp_capacity_gauge

Estimated maximum number of hosted control planes the cluster can host if all hosted control planes make around 1000 QPS to the clusters Kube API server.

mce_hs_addon_high_qps_based_hcp_capacity_gauge

Estimated maximum number of hosted control planes the cluster can host if all hosted control planes make around 2000 QPS to the clusters Kube API server.

mce_hs_addon_average_qps_based_hcp_capacity_gauge

Estimated maximum number of hosted control planes the cluster can host based on the existing average QPS of hosted control planes. If you do not have an active hosted control planes, you can expect low QPS.

2.3. Overriding resource utilization measurements

The set of baseline measurements for resource utilization can vary in each hosted cluster.

2.3.1. Overriding resource utilization measurements for a hosted cluster

You can override resource utilization measurements based on the type and pace of your cluster workload.

Procedure

  1. Create the ConfigMap resource by running the following command:

    $ oc create -f <your-config-map-file.yaml>

    Replace <your-config-map-file.yaml> with the name of your YAML file that contains your hcp-sizing-baseline config map.

  2. Create the hcp-sizing-baseline config map in the local-cluster namespace to specify the measurements you want to override. Your config map might resemble the following YAML file:

    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: hcp-sizing-baseline
      namespace: local-cluster
    data:
      incrementalCPUUsagePer1KQPS: "9.0"
      memoryRequestPerHCP: "18"
      minimumQPSPerHCP: "50.0"
  3. Delete the hypershift-addon-agent deployment to restart the hypershift-addon-agent pod by running the following command:

    $ oc delete deployment hypershift-addon-agent -n open-cluster-management-agent-addon

Verification

  • Observe the hypershift-addon-agent pod logs. Verify that the overridden measurements are updated in the config map by running the following command:

    $ oc logs hypershift-addon-agent -n open-cluster-management-agent-addon

    Your logs might resemble the following output:

    Example output

    2024-01-05T19:41:05.392Z	INFO	agent.agent-reconciler	agent/agent.go:793	setting cpuRequestPerHCP to 5
    2024-01-05T19:41:05.392Z	INFO	agent.agent-reconciler	agent/agent.go:802	setting memoryRequestPerHCP to 18
    2024-01-05T19:53:54.070Z	INFO	agent.agent-reconciler	agent/hcp_capacity_calculation.go:141	The worker nodes have 12.000000 vCPUs
    2024-01-05T19:53:54.070Z	INFO	agent.agent-reconciler	agent/hcp_capacity_calculation.go:142	The worker nodes have 49.173369 GB memory

    If the overridden measurements are not updated properly in the hcp-sizing-baseline config map, you might see the following error message in the hypershift-addon-agent pod logs:

    Example error

    2024-01-05T19:53:54.052Z	ERROR	agent.agent-reconciler	agent/agent.go:788	failed to get configmap from the hub. Setting the HCP sizing baseline with default values.	{"error": "configmaps \"hcp-sizing-baseline\" not found"}

2.3.2. Disabling the metric service monitoring

After you enable the hypershift-addon managed cluster add-on, metric service monitoring is configured by default so that OpenShift Container Platform monitoring can gather metrics from hypershift-addon.

Procedure

You can disable metric service monitoring by completing the following steps:

  1. Log in to your hub cluster by running the following command:

    $ oc login
  2. Edit the hypershift-addon-deploy-config add-on deployment configuration specification by running the following command:

    $ oc edit addondeploymentconfig hypershift-addon-deploy-config -n multicluster-engine
  3. Add the disableMetrics=true customized variable to the specification, as shown in the following example:

    apiVersion: addon.open-cluster-management.io/v1alpha1
    kind: AddOnDeploymentConfig
    metadata:
      name: hypershift-addon-deploy-config
      namespace: multicluster-engine
    spec:
      customizedVariables:
      - name: hcMaxNumber
        value: "80"
      - name: hcThresholdNumber
        value: "60"
      - name: disableMetrics 1
        value: "true"
    1
    The disableMetrics=true customized variable disables metric service monitoring for both new and existing hypershift-addon managed cluster add-ons.
  4. Apply the changes to the configuration specification by running the following command:

    $ oc apply -f <filename>.yaml

2.4. Installing the hosted control planes command-line interface

The hosted control planes command-line interface, hcp, is a tool that you can use to get started with hosted control planes. For Day 2 operations, such as management and configuration, use GitOps or your own automation tool.

2.4.1. Installing the hosted control planes command-line interface by using the CLI

You can install the hosted control planes command-line interface (CLI), hcp, by using the CLI.

Procedure

  1. Get the URL to download the hcp binary by running the following command:

    $ oc get ConsoleCLIDownload hcp-cli-download -o json | jq -r ".spec"
  2. Download the hcp binary by running the following command:

    $ wget <hcp_cli_download_url> 1
    1
    Replace hcp_cli_download_url with the URL that you obtained from the previous step.
  3. Unpack the downloaded archive by running the following command:

    $ tar xvzf hcp.tar.gz
  4. Make the hcp binary file executable by running the following command:

    $ chmod +x hcp
  5. Move the hcp binary file to a directory in your path by running the following command:

    $ sudo mv hcp /usr/local/bin/.

Verification

  • Verify that you see the list of available parameters by running the following command:

    $ hcp create cluster <platform> --help 1
    1
    You can use the hcp create cluster command to create and manage hosted clusters. The supported platforms are aws, agent, and kubevirt.

2.4.2. Installing the hosted control planes command-line interface by using the web console

You can install the hosted control planes command-line interface (CLI), hcp, by using the OpenShift Container Platform web console.

Procedure

  1. From the OpenShift Container Platform web console, click the Help icon Command Line Tools.
  2. Click Download hcp CLI for your platform.
  3. Unpack the downloaded archive by running the following command:

    $ tar xvzf hcp.tar.gz
  4. Run the following command to make the binary file executable:

    $ chmod +x hcp
  5. Run the following command to move the binary file to a directory in your path:

    $ sudo mv hcp /usr/local/bin/.

Verification

  • Verify that you see the list of available parameters by running the following command:

    $ hcp create cluster <platform> --help 1
    1
    You can use the hcp create cluster command to create and manage hosted clusters. The supported platforms are aws, agent, and kubevirt.

2.4.3. Installing the hosted control planes command-line interface by using the content gateway

You can install the hosted control planes command-line interface (CLI), hcp, by using the content gateway.

Procedure

  1. Navigate to the content gateway and download the hcp binary.
  2. Unpack the downloaded archive by running the following command:

    $ tar xvzf hcp.tar.gz
  3. Make the hcp binary file executable by running the following command:

    $ chmod +x hcp
  4. Move the hcp binary file to a directory in your path by running the following command:

    $ sudo mv hcp /usr/local/bin/.

Verification

  • Verify that you see the list of available parameters by running the following command:

    $ hcp create cluster <platform> --help 1
    1
    You can use the hcp create cluster command to create and manage hosted clusters. The supported platforms are aws, agent, and kubevirt.

2.5. Distributing hosted cluster workloads

Before you get started with hosted control planes for OpenShift Container Platform, you must properly label nodes so that the pods of hosted clusters can be scheduled into infrastructure nodes. Node labeling is also important for the following reasons:

  • To ensure high availability and proper workload deployment. For example, you can set the node-role.kubernetes.io/infra label to avoid having the control-plane workload count toward your OpenShift Container Platform subscription.
  • To ensure that control plane workloads are separate from other workloads in the management cluster.
Important

Do not use the management cluster for your workload. Workloads must not run on nodes where control planes run.

2.5.1. Labeling management cluster nodes

Proper node labeling is a prerequisite to deploying hosted control planes.

As a management cluster administrator, you use the following labels and taints in management cluster nodes to schedule a control plane workload:

  • hypershift.openshift.io/control-plane: true: Use this label and taint to dedicate a node to running hosted control plane workloads. By setting a value of true, you avoid sharing the control plane nodes with other components, for example, the infrastructure components of the management cluster or any other mistakenly deployed workload.
  • hypershift.openshift.io/cluster: ${HostedControlPlane Namespace}: Use this label and taint when you want to dedicate a node to a single hosted cluster.

Apply the following labels on the nodes that host control-plane pods:

  • node-role.kubernetes.io/infra: Use this label to avoid having the control-plane workload count toward your subscription.
  • topology.kubernetes.io/zone: Use this label on the management cluster nodes to deploy highly available clusters across failure domains. The zone might be a location, rack name, or the hostname of the node where the zone is set. For example, a management cluster has the following nodes: worker-1a, worker-1b, worker-2a, and worker-2b. The worker-1a and worker-1b nodes are in rack1, and the worker-2a and worker-2b nodes are in rack2. To use each rack as an availability zone, enter the following commands:

    $ oc label node/worker-1a node/worker-1b topology.kubernetes.io/zone=rack1
    $ oc label node/worker-2a node/worker-2b topology.kubernetes.io/zone=rack2

Pods for a hosted cluster have tolerations, and the scheduler uses affinity rules to schedule them. Pods tolerate taints for control-plane and the cluster for the pods. The scheduler prioritizes the scheduling of pods into nodes that are labeled with hypershift.openshift.io/control-plane and hypershift.openshift.io/cluster: ${HostedControlPlane Namespace}.

For the ControllerAvailabilityPolicy option, use HighlyAvailable, which is the default value that the hosted control planes command line interface, hcp, deploys. When you use that option, you can schedule pods for each deployment within a hosted cluster across different failure domains by setting topology.kubernetes.io/zone as the topology key. Control planes that are not highly available are not supported.

Procedure

To enable a hosted cluster to require its pods to be scheduled into infrastructure nodes, set HostedCluster.spec.nodeSelector, as shown in the following example:

  spec:
    nodeSelector:
      role.kubernetes.io/infra: ""

This way, hosted control planes for each hosted cluster are eligible infrastructure node workloads, and you do not need to entitle the underlying OpenShift Container Platform nodes.

2.5.2. Priority classes

Four built-in priority classes influence the priority and preemption of the hosted cluster pods. You can create the pods in the management cluster in the following order from highest to lowest:

  • hypershift-operator: HyperShift Operator pods.
  • hypershift-etcd: Pods for etcd.
  • hypershift-api-critical: Pods that are required for API calls and resource admission to succeed. These pods include pods such as kube-apiserver, aggregated API servers, and web hooks.
  • hypershift-control-plane: Pods in the control plane that are not API-critical but still need elevated priority, such as the cluster version Operator.

2.5.3. Custom taints and tolerations

For hosted control planes on OpenShift Virtualization, by default, pods for a hosted cluster tolerate the control-plane and cluster taints. However, you can also use custom taints on nodes so that hosted clusters can tolerate those taints on a per-hosted-cluster basis by setting HostedCluster.spec.tolerations.

Important

Passing tolerations for a hosted cluster is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Example configuration

  spec:
    tolerations:
    - effect: NoSchedule
      key: kubernetes.io/custom
      operator: Exists

You can also set tolerations on the hosted cluster while you create a cluster by using the --tolerations hcp CLI argument.

Example CLI argument

--toleration="key=kubernetes.io/custom,operator=Exists,effect=NoSchedule"

For fine granular control of hosted cluster pod placement on a per-hosted-cluster basis, use custom tolerations with nodeSelectors. You can co-locate groups of hosted clusters and isolate them from other hosted clusters. You can also place hosted clusters in infra and control plane nodes.

Tolerations on the hosted cluster spread only to the pods of the control plane. To configure other pods that run on the management cluster and infrastructure-related pods, such as the pods to run virtual machines, you need to use a different process.

2.6. Enabling or disabling the hosted control planes feature

The hosted control planes feature, as well as the hypershift-addon managed cluster add-on, are enabled by default. If you want to disable the feature, or if you disabled it and want to manually enable it, see the following procedures.

2.6.1. Manually enabling the hosted control planes feature

If you need to manually enable hosted control planes, complete the following steps.

Procedure

  1. Run the following command to enable the feature:

    $ oc patch mce multiclusterengine --type=merge -p '{"spec":{"overrides":{"components":[{"name":"hypershift","enabled": true}]}}}' 1
    1
    The default MultiClusterEngine resource instance name is multiclusterengine, but you can get the MultiClusterEngine name from your cluster by running the following command: $ oc get mce.
  2. Run the following command to verify that the hypershift and hypershift-local-hosting features are enabled in the MultiClusterEngine custom resource:

    $ oc get mce multiclusterengine -o yaml 1
    1
    The default MultiClusterEngine resource instance name is multiclusterengine, but you can get the MultiClusterEngine name from your cluster by running the following command: $ oc get mce.

    Example output

    apiVersion: multicluster.openshift.io/v1
    kind: MultiClusterEngine
    metadata:
      name: multiclusterengine
    spec:
      overrides:
        components:
        - name: hypershift
          enabled: true
        - name: hypershift-local-hosting
          enabled: true

2.6.1.1. Manually enabling the hypershift-addon managed cluster add-on for local-cluster

Enabling the hosted control planes feature automatically enables the hypershift-addon managed cluster add-on. If you need to enable the hypershift-addon managed cluster add-on manually, complete the following steps to use the hypershift-addon to install the HyperShift Operator on local-cluster.

Procedure

  1. Create the ManagedClusterAddon HyperShift add-on by creating a file that resembles the following example:

    apiVersion: addon.open-cluster-management.io/v1alpha1
    kind: ManagedClusterAddOn
    metadata:
      name: hypershift-addon
      namespace: local-cluster
    spec:
      installNamespace: open-cluster-management-agent-addon
  2. Apply the file by running the following command:

    $ oc apply -f <filename>

    Replace filename with the name of the file that you created.

  3. Confirm that the hypershift-addon is installed by running the following command:

    $ oc get managedclusteraddons -n local-cluster hypershift-addon

    If the add-on is installed, the output resembles the following example:

    NAME               AVAILABLE   DEGRADED   PROGRESSING
    hypershift-addon   True

Your HyperShift add-on is installed and the hosting cluster is available to create and manage hosted clusters.

2.6.2. Disabling the hosted control planes feature

You can uninstall the HyperShift Operator and disable the hosted control planes feature. When you disable the hosted control planes feature, you must destroy the hosted cluster and the managed cluster resource on multicluster engine Operator, as described in the Managing hosted clusters topics.

2.6.2.1. Uninstalling the HyperShift Operator

To uninstall the HyperShift Operator and disable the hypershift-addon from the local-cluster, complete the following steps:

Procedure

  1. Run the following command to ensure that there is no hosted cluster running:

    $ oc get hostedcluster -A
    Important

    If a hosted cluster is running, the HyperShift Operator does not uninstall, even if the hypershift-addon is disabled.

  2. Disable the hypershift-addon by running the following command:

    $ oc patch mce multiclusterengine --type=merge -p '{"spec":{"overrides":{"components":[{"name":"hypershift-local-hosting","enabled": false}]}}}' 1
    1
    The default MultiClusterEngine resource instance name is multiclusterengine, but you can get the MultiClusterEngine name from your cluster by running the following command: $ oc get mce.
    Note

    You can also disable the hypershift-addon for the local-cluster from the multicluster engine Operator console after disabling the hypershift-addon.

2.6.2.2. Disabling the hosted control planes feature

Before you disable the hosted control planes feature, you must first uninstall the HyperShift Operator.

Procedure

  1. Run the following command to disable the hosted control planes feature:

    $ oc patch mce multiclusterengine --type=merge -p '{"spec":{"overrides":{"components":[{"name":"hypershift","enabled": false}]}}}' 1
    1
    The default MultiClusterEngine resource instance name is multiclusterengine, but you can get the MultiClusterEngine name from your cluster by running the following command: $ oc get mce.
  2. You can verify that the hypershift and hypershift-local-hosting features are disabled in the MultiClusterEngine custom resource by running the following command:

    $ oc get mce multiclusterengine -o yaml 1
    1
    The default MultiClusterEngine resource instance name is multiclusterengine, but you can get the MultiClusterEngine name from your cluster by running the following command: $ oc get mce.

    See the following example where hypershift and hypershift-local-hosting have their enabled: flags set to false:

    apiVersion: multicluster.openshift.io/v1
    kind: MultiClusterEngine
    metadata:
      name: multiclusterengine
    spec:
      overrides:
        components:
        - name: hypershift
          enabled: false
        - name: hypershift-local-hosting
          enabled: false
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.