Chapter 1. Limits and scalability

1.1. Cluster maximums
Copy link

Consider the following tested object maximums when you plan an OpenShift Dedicated cluster installation. The table specifies the maximum limits for each tested type in an OpenShift Dedicated cluster.

These guidelines are based on a cluster of 249 compute (also known as worker) nodes in a multiple availability zone configuration. For smaller clusters, the maximums are lower.

Expand

Table 1.1. Tested cluster maximums
Maximum type	4.x tested maximum
Number of pods ^[1]	25,000
Number of pods per node	250
Number of pods per core	There is no default value
Number of namespaces ^[2]	5,000
Number of pods per namespace ^[3]	25,000
Number of services ^[4]	10,000
Number of services per namespace	5,000
Number of back ends per service	5,000
Number of deployments per namespace ^[3]	2,000

The pod count displayed here is the number of test pods. The actual number of pods depends on the memory, CPU, and storage requirements of the application.
When there are a large number of active projects, etcd can suffer from poor performance if the keyspace grows excessively large and exceeds the space quota. Periodic maintenance of etcd, including defragmentation, is highly recommended to make etcd storage available.
There are several control loops in the system that must iterate over all objects in a given namespace as a reaction to some changes in state. Having a large number of objects of a type, in a single namespace, can make those loops expensive and slow down processing the state changes. The limit assumes that the system has enough CPU, memory, and disk to satisfy the application requirements.
Each service port and each service back end has a corresponding entry in iptables. The number of back ends of a given service impacts the size of the endpoints objects, which then impacts the size of data sent throughout the system.

1.2. OpenShift Container Platform testing environment and configuration
Copy link

The following table lists the OpenShift Container Platform environment and configuration on which the cluster maximums are tested for the AWS cloud platform.

Expand

Node	Type	vCPU	RAM(GiB)	Disk type	Disk size(GiB)/IOPS	Count	Region
Control plane/etcd ^[1]	m5.4xlarge	16	64	gp3	350 / 1,000	3	us-west-2
Infrastructure nodes ^[2]	r5.2xlarge	8	64	gp3	300 / 900	3	us-west-2
Workload ^[3]	m5.2xlarge	8	32	gp3	350 / 900	3	us-west-2
Compute nodes	m5.2xlarge	8	32	gp3	350 / 900	102	us-west-2

io1 disks are used for control plane/etcd nodes in all versions prior to 4.10.
Infrastructure nodes are used to host monitoring components because Prometheus can claim a large amount of memory, depending on usage patterns.
Workload nodes are dedicated to run performance and scalability workload generators.

Larger cluster sizes and higher object counts might be reachable. However, the sizing of the infrastructure nodes limits the amount of memory that is available to Prometheus. When creating, modifying, or deleting objects, Prometheus stores the metrics in its memory for roughly 3 hours prior to persisting the metrics on disk. If the rate of creation, modification, or deletion of objects is too high, Prometheus can become overwhelmed and fail due to the lack of memory resources.

1.3. Control plane and infrastructure node sizing and scaling
Copy link

When you install an OpenShift Dedicated cluster, the sizing of the control plane and infrastructure nodes are automatically determined by the compute node count.

If you change the number of compute nodes in your cluster after installation, the Red Hat Site Reliability Engineering (SRE) team scales the control plane and infrastructure nodes as required to maintain cluster stability.

1.3.1. Node sizing during installation
Copy link

During the installation process, the sizing of the control plane and infrastructure nodes are dynamically calculated. The sizing calculation is based on the number of compute nodes in a cluster.

The following tables list the control plane and infrastructure node sizing that is applied during installation.

AWS control plane and infrastructure node size:

Expand

Number of compute nodes	Control plane size	Infrastructure node size
1 to 25	m5.2xlarge	r5.xlarge
26 to 100	m5.4xlarge	r5.2xlarge
101 to 249	m5.8xlarge	r5.4xlarge

Google Cloud control plane and infrastructure node size:

Expand

Number of compute nodes	Control plane size	Infrastructure node size
1 to 25	custom-8-32768	custom-4-32768-ext
26 to 100	custom-16-65536	custom-8-65536-ext
101 to 249	custom-32-131072	custom-16-131072-ext

Google Cloud control plane and infrastructure node size for clusters created on or after 21 June 2024:

Expand

Number of compute nodes	Control plane size	Infrastructure node size
1 to 25	n2-standard-8	n2-highmem-4
26 to 100	n2-standard-16	n2-highmem-8
101 to 249	n2-standard-32	n2-highmem-16

Note

The maximum number of compute nodes on OpenShift Dedicated clusters version 4.14.14 and later is 249. For earlier versions, the limit is 180.

1.3.2. Node scaling after installation
Copy link

If you change the number of compute nodes after installation, the control plane and infrastructure nodes are scaled by the Red Hat Site Reliability Engineering (SRE) team as required. The nodes are scaled to maintain platform stability.

Postinstallation scaling requirements for control plane and infrastructure nodes are assessed on a case-by-case basis. Node resource consumption and received alerts are taken into consideration.

Rules for control plane node resizing alerts

The resizing alert is triggered for the control plane nodes in a cluster when the following occurs:

Control plane nodes sustain over 66% utilization on average in a cluster.
Note
The maximum number of compute nodes on OpenShift Dedicated is 180.

Rules for infrastructure node resizing alerts

Resizing alerts are triggered for the infrastructure nodes in a cluster when it has high-sustained CPU or memory utilization. This high-sustained utilization status is:

Infrastructure nodes sustain over 50% utilization on average in a cluster with a single availability zone using 2 infrastructure nodes.
Infrastructure nodes sustain over 66% utilization on average in a cluster with multiple availability zones using 3 infrastructure nodes.
Note
The maximum number of compute nodes on OpenShift Dedicated cluster versions 4.14.14 and later is 249. For earlier versions, the limit is 180.
The resizing alerts only appear after sustained periods of high utilization. Short usage spikes, such as a node temporarily going down causing the other node to scale up, do not trigger these alerts.

The SRE team might scale the control plane and infrastructure nodes for additional reasons, for example to manage an increase in resource consumption on the nodes.

1.3.3. Sizing considerations for larger clusters
Copy link

For larger clusters, infrastructure node sizing can become a significant impacting factor to scalability. There are many factors that influence the stated thresholds, including the etcd version or storage data format.

Exceeding these limits does not necessarily mean that the cluster will fail. In most cases, exceeding these numbers results in lower overall performance.

1.1. Cluster maximums
Copy link

1.2. OpenShift Container Platform testing environment and configuration
Copy link

1.3. Control plane and infrastructure node sizing and scaling
Copy link

1.3.1. Node sizing during installation
Copy link

1.3.2. Node scaling after installation
Copy link

1.3.3. Sizing considerations for larger clusters
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 1. Limits and scalability

1.1. Cluster maximumsCopy linkLink copied to clipboard!

1.2. OpenShift Container Platform testing environment and configurationCopy linkLink copied to clipboard!

1.3. Control plane and infrastructure node sizing and scalingCopy linkLink copied to clipboard!

1.3.1. Node sizing during installationCopy linkLink copied to clipboard!

1.3.2. Node scaling after installationCopy linkLink copied to clipboard!

1.3.3. Sizing considerations for larger clustersCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.1. Cluster maximums
Copy link

1.2. OpenShift Container Platform testing environment and configuration
Copy link

1.3. Control plane and infrastructure node sizing and scaling
Copy link

1.3.1. Node sizing during installation
Copy link

1.3.2. Node scaling after installation
Copy link

1.3.3. Sizing considerations for larger clusters
Copy link