Scalability and performance
Scaling your OpenShift Container Platform cluster and tuning performance in production environments
Abstract
Chapter 1. OpenShift Container Platform scalability and performance overview
OpenShift Container Platform provides best practices and tools to help you optimize the performance and scale of your clusters. The following documentation provides information on recommended performance and scalability practices, reference design specifications, optimization, and low latency tuning.
To contact Red Hat support, see Getting support.
Some performance and scalability Operators have release cycles that are independent from OpenShift Container Platform release cycles. For more information, see Openshift Operators.
Recommended performance and scalability practices
Recommended control plane practices
Recommended infrastructure practices
Telco reference design specifications
Planning, optimization, and measurement
Planning your environment according to object maximums
Recommended practices for IBM Z and IBM LinuxONE
Using the Node Tuning Operator
Using CPU Manager and Topology Manager
Scheduling NUMA-aware workloads
Optimizing storage, routing, networking and CPU usage
Managing bare metal hosts and events
What are huge pages and how are they used by apps
Low latency tuning for improving cluster stability and partitioning workload
Improving cluster stability in high latency environments using worker latency profiles
Chapter 2. Recommended performance and scalability practices
2.1. Recommended control plane practices
This topic provides recommended performance and scalability practices for control planes in OpenShift Container Platform.
2.1.1. Recommended practices for scaling the cluster
The guidance in this section is only relevant for installations with cloud provider integration.
Apply the following best practices to scale the number of worker machines in your OpenShift Container Platform cluster. You scale the worker machines by increasing or decreasing the number of replicas that are defined in the worker machine set.
When scaling up the cluster to higher node counts:
- Spread nodes across all of the available zones for higher availability.
- Scale up by no more than 25 to 50 machines at once.
- Consider creating new compute machine sets in each available zone with alternative instance types of similar size to help mitigate any periodic provider capacity constraints. For example, on AWS, use m5.large and m5d.large.
Cloud providers might implement a quota for API services. Therefore, gradually scale the cluster.
The controller might not be able to create the machines if the replicas in the compute machine sets are set to higher numbers all at one time. The number of requests the cloud platform, which OpenShift Container Platform is deployed on top of, is able to handle impacts the process. The controller will start to query more while trying to create, check, and update the machines with the status. The cloud platform on which OpenShift Container Platform is deployed has API request limits; excessive queries might lead to machine creation failures due to cloud platform limitations.
Enable machine health checks when scaling to large node counts. In case of failures, the health checks monitor the condition and automatically repair unhealthy machines.
When scaling large and dense clusters to lower node counts, it might take large amounts of time because the process involves draining or evicting the objects running on the nodes being terminated in parallel. Also, the client might start to throttle the requests if there are too many objects to evict. The default client queries per second (QPS) and burst rates are currently set to 50
and 100
respectively. These values cannot be modified in OpenShift Container Platform.
2.1.2. Control plane node sizing
The control plane node resource requirements depend on the number and type of nodes and objects in the cluster. The following control plane node size recommendations are based on the results of a control plane density focused testing, or Cluster-density. This test creates the following objects across a given number of namespaces:
- 1 image stream
- 1 build
-
5 deployments, with 2 pod replicas in a
sleep
state, mounting 4 secrets, 4 config maps, and 1 downward API volume each - 5 services, each one pointing to the TCP/8080 and TCP/8443 ports of one of the previous deployments
- 1 route pointing to the first of the previous services
- 10 secrets containing 2048 random string characters
- 10 config maps containing 2048 random string characters
Number of worker nodes | Cluster-density (namespaces) | CPU cores | Memory (GB) |
---|---|---|---|
24 | 500 | 4 | 16 |
120 | 1000 | 8 | 32 |
252 | 4000 | 16, but 24 if using the OVN-Kubernetes network plug-in | 64, but 128 if using the OVN-Kubernetes network plug-in |
501, but untested with the OVN-Kubernetes network plug-in | 4000 | 16 | 96 |
The data from the table above is based on an OpenShift Container Platform running on top of AWS, using r5.4xlarge instances as control-plane nodes and m5.2xlarge instances as worker nodes.
On a large and dense cluster with three control plane nodes, the CPU and memory usage will spike up when one of the nodes is stopped, rebooted, or fails. The failures can be due to unexpected issues with power, network, underlying infrastructure, or intentional cases where the cluster is restarted after shutting it down to save costs. The remaining two control plane nodes must handle the load in order to be highly available, which leads to increase in the resource usage. This is also expected during upgrades because the control plane nodes are cordoned, drained, and rebooted serially to apply the operating system updates, as well as the control plane Operators update. To avoid cascading failures, keep the overall CPU and memory resource usage on the control plane nodes to at most 60% of all available capacity to handle the resource usage spikes. Increase the CPU and memory on the control plane nodes accordingly to avoid potential downtime due to lack of resources.
The node sizing varies depending on the number of nodes and object counts in the cluster. It also depends on whether the objects are actively being created on the cluster. During object creation, the control plane is more active in terms of resource usage compared to when the objects are in the running
phase.
Operator Lifecycle Manager (OLM ) runs on the control plane nodes and its memory footprint depends on the number of namespaces and user installed operators that OLM needs to manage on the cluster. Control plane nodes need to be sized accordingly to avoid OOM kills. Following data points are based on the results from cluster maximums testing.
Number of namespaces | OLM memory at idle state (GB) | OLM memory with 5 user operators installed (GB) |
---|---|---|
500 | 0.823 | 1.7 |
1000 | 1.2 | 2.5 |
1500 | 1.7 | 3.2 |
2000 | 2 | 4.4 |
3000 | 2.7 | 5.6 |
4000 | 3.8 | 7.6 |
5000 | 4.2 | 9.02 |
6000 | 5.8 | 11.3 |
7000 | 6.6 | 12.9 |
8000 | 6.9 | 14.8 |
9000 | 8 | 17.7 |
10,000 | 9.9 | 21.6 |
You can modify the control plane node size in a running OpenShift Container Platform 4.15 cluster for the following configurations only:
- Clusters installed with a user-provisioned installation method.
- AWS clusters installed with an installer-provisioned infrastructure installation method.
- Clusters that use a control plane machine set to manage control plane machines.
For all other configurations, you must estimate your total node count and use the suggested control plane node size during installation.
The recommendations are based on the data points captured on OpenShift Container Platform clusters with OpenShift SDN as the network plugin.
In OpenShift Container Platform 4.15, half of a CPU core (500 millicore) is now reserved by the system by default compared to OpenShift Container Platform 3.11 and previous versions. The sizes are determined taking that into consideration.
2.1.2.1. Selecting a larger Amazon Web Services instance type for control plane machines
If the control plane machines in an Amazon Web Services (AWS) cluster require more resources, you can select a larger AWS instance type for the control plane machines to use.
The procedure for clusters that use a control plane machine set is different from the procedure for clusters that do not use a control plane machine set.
If you are uncertain about the state of the ControlPlaneMachineSet
CR in your cluster, you can verify the CR status.
2.1.2.1.1. Changing the Amazon Web Services instance type by using a control plane machine set
You can change the Amazon Web Services (AWS) instance type that your control plane machines use by updating the specification in the control plane machine set custom resource (CR).
Prerequisites
- Your AWS cluster uses a control plane machine set.
Procedure
Edit your control plane machine set CR by running the following command:
$ oc --namespace openshift-machine-api edit controlplanemachineset.machine.openshift.io cluster
Edit the following line under the
providerSpec
field:providerSpec: value: ... instanceType: <compatible_aws_instance_type> 1
- 1
- Specify a larger AWS instance type with the same base as the previous selection. For example, you can change
m6i.xlarge
tom6i.2xlarge
orm6i.4xlarge
.
Save your changes.
-
For clusters that use the default
RollingUpdate
update strategy, the Operator automatically propagates the changes to your control plane configuration. -
For clusters that are configured to use the
OnDelete
update strategy, you must replace your control plane machines manually.
-
For clusters that use the default
Additional resources
2.1.2.1.2. Changing the Amazon Web Services instance type by using the AWS console
You can change the Amazon Web Services (AWS) instance type that your control plane machines use by updating the instance type in the AWS console.
Prerequisites
- You have access to the AWS console with the permissions required to modify the EC2 Instance for your cluster.
-
You have access to the OpenShift Container Platform cluster as a user with the
cluster-admin
role.
Procedure
- Open the AWS console and fetch the instances for the control plane machines.
Choose one control plane machine instance.
- For the selected control plane machine, back up the etcd data by creating an etcd snapshot. For more information, see "Backing up etcd".
- In the AWS console, stop the control plane machine instance.
- Select the stopped instance, and click Actions → Instance Settings → Change instance type.
-
Change the instance to a larger type, ensuring that the type is the same base as the previous selection, and apply changes. For example, you can change
m6i.xlarge
tom6i.2xlarge
orm6i.4xlarge
. - Start the instance.
-
If your OpenShift Container Platform cluster has a corresponding
Machine
object for the instance, update the instance type of the object to match the instance type set in the AWS console.
- Repeat this process for each control plane machine.
Additional resources
2.2. Recommended infrastructure practices
This topic provides recommended performance and scalability practices for infrastructure in OpenShift Container Platform.
2.2.1. Infrastructure node sizing
Infrastructure nodes are nodes that are labeled to run pieces of the OpenShift Container Platform environment. The infrastructure node resource requirements depend on the cluster age, nodes, and objects in the cluster, as these factors can lead to an increase in the number of metrics or time series in Prometheus. The following infrastructure node size recommendations are based on the results observed in cluster-density testing detailed in the Control plane node sizing section, where the monitoring stack and the default ingress-controller were moved to these nodes.
Number of worker nodes | Cluster density, or number of namespaces | CPU cores | Memory (GB) |
---|---|---|---|
27 | 500 | 4 | 24 |
120 | 1000 | 8 | 48 |
252 | 4000 | 16 | 128 |
501 | 4000 | 32 | 128 |
In general, three infrastructure nodes are recommended per cluster.
These sizing recommendations should be used as a guideline. Prometheus is a highly memory intensive application; the resource usage depends on various factors including the number of nodes, objects, the Prometheus metrics scraping interval, metrics or time series, and the age of the cluster. In addition, the router resource usage can also be affected by the number of routes and the amount/type of inbound requests.
These recommendations apply only to infrastructure nodes hosting Monitoring, Ingress and Registry infrastructure components installed during cluster creation.
In OpenShift Container Platform 4.15, half of a CPU core (500 millicore) is now reserved by the system by default compared to OpenShift Container Platform 3.11 and previous versions. This influences the stated sizing recommendations.
2.2.2. Scaling the Cluster Monitoring Operator
OpenShift Container Platform exposes metrics that the Cluster Monitoring Operator (CMO) collects and stores in the Prometheus-based monitoring stack. As an administrator, you can view dashboards for system resources, containers, and components metrics in the OpenShift Container Platform web console by navigating to Observe → Dashboards.
2.2.3. Prometheus database storage requirements
Red Hat performed various tests for different scale sizes.
- The following Prometheus storage requirements are not prescriptive and should be used as a reference. Higher resource consumption might be observed in your cluster depending on workload activity and resource density, including the number of pods, containers, routes, or other resources exposing metrics collected by Prometheus.
- You can configure the size-based data retention policy to suit your storage requirements.
Number of nodes | Number of pods (2 containers per pod) | Prometheus storage growth per day | Prometheus storage growth per 15 days | Network (per tsdb chunk) |
---|---|---|---|---|
50 | 1800 | 6.3 GB | 94 GB | 16 MB |
100 | 3600 | 13 GB | 195 GB | 26 MB |
150 | 5400 | 19 GB | 283 GB | 36 MB |
200 | 7200 | 25 GB | 375 GB | 46 MB |
Approximately 20 percent of the expected size was added as overhead to ensure that the storage requirements do not exceed the calculated value.
The above calculation is for the default OpenShift Container Platform Cluster Monitoring Operator.
CPU utilization has minor impact. The ratio is approximately 1 core out of 40 per 50 nodes and 1800 pods.
Recommendations for OpenShift Container Platform
- Use at least two infrastructure (infra) nodes.
- Use at least three openshift-container-storage nodes with non-volatile memory express (SSD or NVMe) drives.
2.2.4. Configuring cluster monitoring
You can increase the storage capacity for the Prometheus component in the cluster monitoring stack.
Procedure
To increase the storage capacity for Prometheus:
Create a YAML configuration file,
cluster-monitoring-config.yaml
. For example:apiVersion: v1 kind: ConfigMap data: config.yaml: | prometheusK8s: retention: {{PROMETHEUS_RETENTION_PERIOD}} 1 nodeSelector: node-role.kubernetes.io/infra: "" volumeClaimTemplate: spec: storageClassName: {{STORAGE_CLASS}} 2 resources: requests: storage: {{PROMETHEUS_STORAGE_SIZE}} 3 alertmanagerMain: nodeSelector: node-role.kubernetes.io/infra: "" volumeClaimTemplate: spec: storageClassName: {{STORAGE_CLASS}} 4 resources: requests: storage: {{ALERTMANAGER_STORAGE_SIZE}} 5 metadata: name: cluster-monitoring-config namespace: openshift-monitoring
- 1
- The default value of Prometheus retention is
PROMETHEUS_RETENTION_PERIOD=15d
. Units are measured in time using one of these suffixes: s, m, h, d. - 2 4
- The storage class for your cluster.
- 3
- A typical value is
PROMETHEUS_STORAGE_SIZE=2000Gi
. Storage values can be a plain integer or a fixed-point integer using one of these suffixes: E, P, T, G, M, K. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. - 5
- A typical value is
ALERTMANAGER_STORAGE_SIZE=20Gi
. Storage values can be a plain integer or a fixed-point integer using one of these suffixes: E, P, T, G, M, K. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki.
- Add values for the retention period, storage class, and storage sizes.
- Save the file.
Apply the changes by running:
$ oc create -f cluster-monitoring-config.yaml
2.2.5. Additional resources
2.3. Recommended etcd practices
This topic provides recommended performance and scalability practices for etcd in OpenShift Container Platform.
2.3.1. Recommended etcd practices
Because etcd writes data to disk and persists proposals on disk, its performance depends on disk performance. Although etcd is not particularly I/O intensive, it requires a low latency block device for optimal performance and stability. Because etcd’s consensus protocol depends on persistently storing metadata to a log (WAL), etcd is sensitive to disk-write latency. Slow disks and disk activity from other processes can cause long fsync latencies.
Those latencies can cause etcd to miss heartbeats, not commit new proposals to the disk on time, and ultimately experience request timeouts and temporary leader loss. High write latencies also lead to an OpenShift API slowness, which affects cluster performance. Because of these reasons, avoid colocating other workloads on the control-plane nodes that are I/O sensitive or intensive and share the same underlying I/O infrastructure.
In terms of latency, run etcd on top of a block device that can write at least 50 IOPS of 8000 bytes long sequentially. That is, with a latency of 10ms, keep in mind that uses fdatasync to synchronize each write in the WAL. For heavy loaded clusters, sequential 500 IOPS of 8000 bytes (2 ms) are recommended. To measure those numbers, you can use a benchmarking tool, such as fio.
To achieve such performance, run etcd on machines that are backed by SSD or NVMe disks with low latency and high throughput. Consider single-level cell (SLC) solid-state drives (SSDs), which provide 1 bit per memory cell, are durable and reliable, and are ideal for write-intensive workloads.
The load on etcd arises from static factors, such as the number of nodes and pods, and dynamic factors, including changes in endpoints due to pod autoscaling, pod restarts, job executions, and other workload-related events. To accurately size your etcd setup, you must analyze the specific requirements of your workload. Consider the number of nodes, pods, and other relevant factors that impact the load on etcd.
The following hard drive practices provide optimal etcd performance:
- Use dedicated etcd drives. Avoid drives that communicate over the network, such as iSCSI. Do not place log files or other heavy workloads on etcd drives.
- Prefer drives with low latency to support fast read and write operations.
- Prefer high-bandwidth writes for faster compactions and defragmentation.
- Prefer high-bandwidth reads for faster recovery from failures.
- Use solid state drives as a minimum selection. Prefer NVMe drives for production environments.
- Use server-grade hardware for increased reliability.
Avoid NAS or SAN setups and spinning drives. Ceph Rados Block Device (RBD) and other types of network-attached storage can result in unpredictable network latency. To provide fast storage to etcd nodes at scale, use PCI passthrough to pass NVM devices directly to the nodes.
Always benchmark by using utilities such as fio. You can use such utilities to continuously monitor the cluster performance as it increases.
Avoid using the Network File System (NFS) protocol or other network based file systems.
Some key metrics to monitor on a deployed OpenShift Container Platform cluster are p99 of etcd disk write ahead log duration and the number of etcd leader changes. Use Prometheus to track these metrics.
The etcd member database sizes can vary in a cluster during normal operations. This difference does not affect cluster upgrades, even if the leader size is different from the other members.
To validate the hardware for etcd before or after you create the OpenShift Container Platform cluster, you can use fio.
Prerequisites
- Container runtimes such as Podman or Docker are installed on the machine that you’re testing.
-
Data is written to the
/var/lib/etcd
path.
Procedure
Run fio and analyze the results:
If you use Podman, run this command:
$ sudo podman run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/cloud-bulldozer/etcd-perf
If you use Docker, run this command:
$ sudo docker run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/cloud-bulldozer/etcd-perf
The output reports whether the disk is fast enough to host etcd by comparing the 99th percentile of the fsync metric captured from the run to see if it is less than 10 ms. A few of the most important etcd metrics that might affected by I/O performance are as follow:
-
etcd_disk_wal_fsync_duration_seconds_bucket
metric reports the etcd’s WAL fsync duration -
etcd_disk_backend_commit_duration_seconds_bucket
metric reports the etcd backend commit latency duration -
etcd_server_leader_changes_seen_total
metric reports the leader changes
Because etcd replicates the requests among all the members, its performance strongly depends on network input/output (I/O) latency. High network latencies result in etcd heartbeats taking longer than the election timeout, which results in leader elections that are disruptive to the cluster. A key metric to monitor on a deployed OpenShift Container Platform cluster is the 99th percentile of etcd network peer latency on each etcd cluster member. Use Prometheus to track the metric.
The histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[2m]))
metric reports the round trip time for etcd to finish replicating the client requests between the members. Ensure that it is less than 50 ms.
2.3.2. Moving etcd to a different disk
You can move etcd from a shared disk to a separate disk to prevent or resolve performance issues.
The Machine Config Operator (MCO) is responsible for mounting a secondary disk for OpenShift Container Platform 4.15 container storage.
This encoded script only supports device names for the following device types:
- SCSI or SATA
-
/dev/sd*
- Virtual device
-
/dev/vd*
- NVMe
-
/dev/nvme*[0-9]*n*
Limitations
-
When the new disk is attached to the cluster, the etcd database is part of the root mount. It is not part of the secondary disk or the intended disk when the primary node is recreated. As a result, the primary node will not create a separate
/var/lib/etcd
mount.
Prerequisites
- You have a backup of your cluster’s etcd data.
-
You have installed the OpenShift CLI (
oc
). -
You have access to the cluster with
cluster-admin
privileges. - Add additional disks before uploading the machine configuration.
-
The
MachineConfigPool
must matchmetadata.labels[machineconfiguration.openshift.io/role]
. This applies to a controller, worker, or a custom pool.
This procedure does not move parts of the root file system, such as /var/
, to another disk or partition on an installed node.
This procedure is not supported when using control plane machine sets.
Procedure
Attach the new disk to the cluster and verify that the disk is detected in the node by running the
lsblk
command in a debug shell:$ oc debug node/<node_name>
# lsblk
Note the device name of the new disk reported by the
lsblk
command.Create the following script and name it
etcd-find-secondary-device.sh
:#!/bin/bash set -uo pipefail for device in <device_type_glob>; do 1 /usr/sbin/blkid "${device}" &> /dev/null if [ $? == 2 ]; then echo "secondary device found ${device}" echo "creating filesystem for etcd mount" mkfs.xfs -L var-lib-etcd -f "${device}" &> /dev/null udevadm settle touch /etc/var-lib-etcd-mount exit fi done echo "Couldn't find secondary block device!" >&2 exit 77
- 1
- Replace
<device_type_glob>
with a shell glob for your block device type. For SCSI or SATA drives, use/dev/sd*
; for virtual drives, use/dev/vd*
; for NVMe drives, use/dev/nvme*[0-9]*n*
.
Create a base64-encoded string from the
etcd-find-secondary-device.sh
script and note its contents:$ base64 -w0 etcd-find-secondary-device.sh
Create a
MachineConfig
YAML file namedetcd-mc.yml
with contents such as the following:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 98-var-lib-etcd spec: config: ignition: version: 3.1.0 storage: files: - path: /etc/find-secondary-device mode: 0755 contents: source: data:text/plain;charset=utf-8;base64,<encoded_etcd_find_secondary_device_script> 1 systemd: units: - name: find-secondary-device.service enabled: true contents: | [Unit] Description=Find secondary device DefaultDependencies=false After=systemd-udev-settle.service Before=local-fs-pre.target ConditionPathExists=!/etc/var-lib-etcd-mount [Service] RemainAfterExit=yes ExecStart=/etc/find-secondary-device RestartForceExitStatus=77 [Install] WantedBy=multi-user.target - name: var-lib-etcd.mount enabled: true contents: | [Unit] Before=local-fs.target [Mount] What=/dev/disk/by-label/var-lib-etcd Where=/var/lib/etcd Type=xfs TimeoutSec=120s [Install] RequiredBy=local-fs.target - name: sync-var-lib-etcd-to-etcd.service enabled: true contents: | [Unit] Description=Sync etcd data if new mount is empty DefaultDependencies=no After=var-lib-etcd.mount var.mount Before=crio.service [Service] Type=oneshot RemainAfterExit=yes ExecCondition=/usr/bin/test ! -d /var/lib/etcd/member ExecStart=/usr/sbin/setsebool -P rsync_full_access 1 ExecStart=/bin/rsync -ar /sysroot/ostree/deploy/rhcos/var/lib/etcd/ /var/lib/etcd/ ExecStart=/usr/sbin/semanage fcontext -a -t container_var_lib_t '/var/lib/etcd(/.*)?' ExecStart=/usr/sbin/setsebool -P rsync_full_access 0 TimeoutSec=0 [Install] WantedBy=multi-user.target graphical.target - name: restorecon-var-lib-etcd.service enabled: true contents: | [Unit] Description=Restore recursive SELinux security contexts DefaultDependencies=no After=var-lib-etcd.mount Before=crio.service [Service] Type=oneshot RemainAfterExit=yes ExecStart=/sbin/restorecon -R /var/lib/etcd/ TimeoutSec=0 [Install] WantedBy=multi-user.target graphical.target
- 1
- Replace
<encoded_etcd_find_secondary_device_script>
with the encoded script contents that you noted.
Verification steps
Run the
grep /var/lib/etcd /proc/mounts
command in a debug shell for the node to ensure that the disk is mounted:$ oc debug node/<node_name>
# grep -w "/var/lib/etcd" /proc/mounts
Example output
/dev/sdb /var/lib/etcd xfs rw,seclabel,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
Additional resources
2.3.3. Defragmenting etcd data
For large and dense clusters, etcd can suffer from poor performance if the keyspace grows too large and exceeds the space quota. Periodically maintain and defragment etcd to free up space in the data store. Monitor Prometheus for etcd metrics and defragment it when required; otherwise, etcd can raise a cluster-wide alarm that puts the cluster into a maintenance mode that accepts only key reads and deletes.
Monitor these key metrics:
-
etcd_server_quota_backend_bytes
, which is the current quota limit -
etcd_mvcc_db_total_size_in_use_in_bytes
, which indicates the actual database usage after a history compaction -
etcd_mvcc_db_total_size_in_bytes
, which shows the database size, including free space waiting for defragmentation
Defragment etcd data to reclaim disk space after events that cause disk fragmentation, such as etcd history compaction.
History compaction is performed automatically every five minutes and leaves gaps in the back-end database. This fragmented space is available for use by etcd, but is not available to the host file system. You must defragment etcd to make this space available to the host file system.
Defragmentation occurs automatically, but you can also trigger it manually.
Automatic defragmentation is good for most cases, because the etcd operator uses cluster information to determine the most efficient operation for the user.
2.3.3.1. Automatic defragmentation
The etcd Operator automatically defragments disks. No manual intervention is needed.
Verify that the defragmentation process is successful by viewing one of these logs:
- etcd logs
- cluster-etcd-operator pod
- operator status error log
Automatic defragmentation can cause leader election failure in various OpenShift core components, such as the Kubernetes controller manager, which triggers a restart of the failing component. The restart is harmless and either triggers failover to the next running instance or the component resumes work again after the restart.
Example log output for successful defragmentation
etcd member has been defragmented: <member_name>, memberID: <member_id>
Example log output for unsuccessful defragmentation
failed defrag on member: <member_name>, memberID: <member_id>: <error_message>
2.3.3.2. Manual defragmentation
A Prometheus alert indicates when you need to use manual defragmentation. The alert is displayed in two cases:
- When etcd uses more than 50% of its available space for more than 10 minutes
- When etcd is actively using less than 50% of its total database size for more than 10 minutes
You can also determine whether defragmentation is needed by checking the etcd database size in MB that will be freed by defragmentation with the PromQL expression: (etcd_mvcc_db_total_size_in_bytes - etcd_mvcc_db_total_size_in_use_in_bytes)/1024/1024
Defragmenting etcd is a blocking action. The etcd member will not respond until defragmentation is complete. For this reason, wait at least one minute between defragmentation actions on each of the pods to allow the cluster to recover.
Follow this procedure to defragment etcd data on each etcd member.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
Determine which etcd member is the leader, because the leader should be defragmented last.
Get the list of etcd pods:
$ oc -n openshift-etcd get pods -l k8s-app=etcd -o wide
Example output
etcd-ip-10-0-159-225.example.redhat.com 3/3 Running 0 175m 10.0.159.225 ip-10-0-159-225.example.redhat.com <none> <none> etcd-ip-10-0-191-37.example.redhat.com 3/3 Running 0 173m 10.0.191.37 ip-10-0-191-37.example.redhat.com <none> <none> etcd-ip-10-0-199-170.example.redhat.com 3/3 Running 0 176m 10.0.199.170 ip-10-0-199-170.example.redhat.com <none> <none>
Choose a pod and run the following command to determine which etcd member is the leader:
$ oc rsh -n openshift-etcd etcd-ip-10-0-159-225.example.redhat.com etcdctl endpoint status --cluster -w table
Example output
Defaulting container name to etcdctl. Use 'oc describe pod/etcd-ip-10-0-159-225.example.redhat.com -n openshift-etcd' to see all of the containers in this pod. +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://10.0.191.37:2379 | 251cd44483d811c3 | 3.5.9 | 104 MB | false | false | 7 | 91624 | 91624 | | | https://10.0.159.225:2379 | 264c7c58ecbdabee | 3.5.9 | 104 MB | false | false | 7 | 91624 | 91624 | | | https://10.0.199.170:2379 | 9ac311f93915cc79 | 3.5.9 | 104 MB | true | false | 7 | 91624 | 91624 | | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Based on the
IS LEADER
column of this output, thehttps://10.0.199.170:2379
endpoint is the leader. Matching this endpoint with the output of the previous step, the pod name of the leader isetcd-ip-10-0-199-170.example.redhat.com
.
Defragment an etcd member.
Connect to the running etcd container, passing in the name of a pod that is not the leader:
$ oc rsh -n openshift-etcd etcd-ip-10-0-159-225.example.redhat.com
Unset the
ETCDCTL_ENDPOINTS
environment variable:sh-4.4# unset ETCDCTL_ENDPOINTS
Defragment the etcd member:
sh-4.4# etcdctl --command-timeout=30s --endpoints=https://localhost:2379 defrag
Example output
Finished defragmenting etcd member[https://localhost:2379]
If a timeout error occurs, increase the value for
--command-timeout
until the command succeeds.Verify that the database size was reduced:
sh-4.4# etcdctl endpoint status -w table --cluster
Example output
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://10.0.191.37:2379 | 251cd44483d811c3 | 3.5.9 | 104 MB | false | false | 7 | 91624 | 91624 | | | https://10.0.159.225:2379 | 264c7c58ecbdabee | 3.5.9 | 41 MB | false | false | 7 | 91624 | 91624 | | 1 | https://10.0.199.170:2379 | 9ac311f93915cc79 | 3.5.9 | 104 MB | true | false | 7 | 91624 | 91624 | | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
This example shows that the database size for this etcd member is now 41 MB as opposed to the starting size of 104 MB.
Repeat these steps to connect to each of the other etcd members and defragment them. Always defragment the leader last.
Wait at least one minute between defragmentation actions to allow the etcd pod to recover. Until the etcd pod recovers, the etcd member will not respond.
If any
NOSPACE
alarms were triggered due to the space quota being exceeded, clear them.Check if there are any
NOSPACE
alarms:sh-4.4# etcdctl alarm list
Example output
memberID:12345678912345678912 alarm:NOSPACE
Clear the alarms:
sh-4.4# etcdctl alarm disarm
2.3.4. Setting tuning parameters for etcd
You can set the control plane hardware speed to "Standard"
, "Slower"
, or the default, which is ""
.
The default setting allows the system to decide which speed to use. This value enables upgrades from versions where this feature does not exist, as the system can select values from previous versions.
By selecting one of the other values, you are overriding the default. If you see many leader elections due to timeouts or missed heartbeats and your system is set to ""
or "Standard"
, set the hardware speed to "Slower"
to make the system more tolerant to the increased latency.
Tuning etcd latency tolerances is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
2.3.4.1. Changing hardware speed tolerance
To change the hardware speed tolerance for etcd, complete the following steps.
Prerequisites
-
You have edited the cluster instance to enable
TechPreviewNoUpgrade
features. For more information, see "Understanding feature gates" in the Additional resources.
Procedure
Check to see what the current value is by entering the following command:
$ oc describe etcd/cluster | grep "Control Plane Hardware Speed"
Example output
Control Plane Hardware Speed: <VALUE>
NoteIf the output is empty, the field has not been set and should be considered as the default ("").
Change the value by entering the following command. Replace
<value>
with one of the valid values:""
,"Standard"
, or"Slower"
:$ oc patch etcd/cluster --type=merge -p '{"spec": {"controlPlaneHardwareSpeed": "<value>"}}'
The following table indicates the heartbeat interval and leader election timeout for each profile. These values are subject to change.
Profile
ETCD_HEARTBEAT_INTERVAL
ETCD_LEADER_ELECTION_TIMEOUT
""
Varies depending on platform
Varies depending on platform
Standard
100
1000
Slower
500
2500
Review the output:
Example output
etcd.operator.openshift.io/cluster patched
If you enter any value besides the valid values, error output is displayed. For example, if you entered
"Faster"
as the value, the output is as follows:Example output
The Etcd "cluster" is invalid: spec.controlPlaneHardwareSpeed: Unsupported value: "Faster": supported values: "", "Standard", "Slower"
Verify that the value was changed by entering the following command:
$ oc describe etcd/cluster | grep "Control Plane Hardware Speed"
Example output
Control Plane Hardware Speed: ""
Wait for etcd pods to roll out:
$ oc get pods -n openshift-etcd -w
The following output shows the expected entries for master-0. Before you continue, wait until all masters show a status of
4/4 Running
.Example output
installer-9-ci-ln-qkgs94t-72292-9clnd-master-0 0/1 Pending 0 0s installer-9-ci-ln-qkgs94t-72292-9clnd-master-0 0/1 Pending 0 0s installer-9-ci-ln-qkgs94t-72292-9clnd-master-0 0/1 ContainerCreating 0 0s installer-9-ci-ln-qkgs94t-72292-9clnd-master-0 0/1 ContainerCreating 0 1s installer-9-ci-ln-qkgs94t-72292-9clnd-master-0 1/1 Running 0 2s installer-9-ci-ln-qkgs94t-72292-9clnd-master-0 0/1 Completed 0 34s installer-9-ci-ln-qkgs94t-72292-9clnd-master-0 0/1 Completed 0 36s installer-9-ci-ln-qkgs94t-72292-9clnd-master-0 0/1 Completed 0 36s etcd-guard-ci-ln-qkgs94t-72292-9clnd-master-0 0/1 Running 0 26m etcd-ci-ln-qkgs94t-72292-9clnd-master-0 4/4 Terminating 0 11m etcd-ci-ln-qkgs94t-72292-9clnd-master-0 4/4 Terminating 0 11m etcd-ci-ln-qkgs94t-72292-9clnd-master-0 0/4 Pending 0 0s etcd-ci-ln-qkgs94t-72292-9clnd-master-0 0/4 Init:1/3 0 1s etcd-ci-ln-qkgs94t-72292-9clnd-master-0 0/4 Init:2/3 0 2s etcd-ci-ln-qkgs94t-72292-9clnd-master-0 0/4 PodInitializing 0 3s etcd-ci-ln-qkgs94t-72292-9clnd-master-0 3/4 Running 0 4s etcd-guard-ci-ln-qkgs94t-72292-9clnd-master-0 1/1 Running 0 26m etcd-ci-ln-qkgs94t-72292-9clnd-master-0 3/4 Running 0 20s etcd-ci-ln-qkgs94t-72292-9clnd-master-0 4/4 Running 0 20s
Enter the following command to review to the values:
$ oc describe -n openshift-etcd pod/<ETCD_PODNAME> | grep -e HEARTBEAT_INTERVAL -e ELECTION_TIMEOUT
NoteThese values might not have changed from the default.
Additional resources
Chapter 3. Reference design specifications
3.1. Telco core and RAN DU reference design specifications
The telco core reference design specification (RDS) describes OpenShift Container Platform 4.15 clusters running on commodity hardware that can support large scale telco applications including control plane and some centralized data plane functions.
The telco RAN RDS describes the configuration for clusters running on commodity hardware to host 5G workloads in the Radio Access Network (RAN).
3.1.1. Reference design specifications for telco 5G deployments
Red Hat and certified partners offer deep technical expertise and support for networking and operational capabilities required to run telco applications on OpenShift Container Platform 4.15 clusters.
Red Hat’s telco partners require a well-integrated, well-tested, and stable environment that can be replicated at scale for enterprise 5G solutions. The telco core and RAN DU reference design specifications (RDS) outline the recommended solution architecture based on a specific version of OpenShift Container Platform. Each RDS describes a tested and validated platform configuration for telco core and RAN DU use models. The RDS ensures an optimal experience when running your applications by defining the set of critical KPIs for telco 5G core and RAN DU. Following the RDS minimizes high severity escalations and improves application stability.
5G use cases are evolving and your workloads are continually changing. Red Hat is committed to iterating over the telco core and RAN DU RDS to support evolving requirements based on customer and partner feedback.
3.1.2. Reference design scope
The telco core and telco RAN reference design specifications (RDS) capture the recommended, tested, and supported configurations to get reliable and repeatable performance for clusters running the telco core and telco RAN profiles.
Each RDS includes the released features and supported configurations that are engineered and validated for clusters to run the individual profiles. The configurations provide a baseline OpenShift Container Platform installation that meets feature and KPI targets. Each RDS also describes expected variations for each individual configuration. Validation of each RDS includes many long duration and at-scale tests.
The validated reference configurations are updated for each major Y-stream release of OpenShift Container Platform. Z-stream patch releases are periodically re-tested against the reference configurations.
3.1.3. Deviations from the reference design
Deviating from the validated telco core and telco RAN DU reference design specifications (RDS) can have significant impact beyond the specific component or feature that you change. Deviations require analysis and engineering in the context of the complete solution.
All deviations from the RDS should be analyzed and documented with clear action tracking information. Due diligence is expected from partners to understand how to bring deviations into line with the reference design. This might require partners to provide additional resources to engage with Red Hat to work towards enabling their use case to achieve a best in class outcome with the platform. This is critical for the supportability of the solution and ensuring alignment across Red Hat and with partners.
Deviation from the RDS can have some or all of the following consequences:
- It can take longer to resolve issues.
- There is a risk of missing project service-level agreements (SLAs), project deadlines, end provider performance requirements, and so on.
Unapproved deviations may require escalation at executive levels.
NoteRed Hat prioritizes the servicing of requests for deviations based on partner engagement priorities.
3.2. Telco RAN DU reference design specification
3.2.1. Telco RAN DU 4.15 reference design overview
The Telco RAN distributed unit (DU) 4.15 reference design configures an OpenShift Container Platform 4.15 cluster running on commodity hardware to host telco RAN DU workloads. It captures the recommended, tested, and supported configurations to get reliable and repeatable performance for a cluster running the telco RAN DU profile.
3.2.1.1. OpenShift Container Platform 4.15 features for telco RAN DU
The 4.15 version of the telco RAN DU reference design specification (RDS) has the same feature set as the 4.14 version of the telco RAN DU RDS.
See OpenShift Container Platform 4.14 telco RAN DU features for more information.
3.2.1.2. Deployment architecture overview
You deploy the telco RAN DU 4.15 reference configuration to managed clusters from a centrally managed RHACM hub cluster. The reference design specification (RDS) includes configuration of the managed clusters and the hub cluster components.
Figure 3.1. Telco RAN DU deployment architecture overview
3.2.2. Telco RAN DU use model overview
Use the following information to plan telco RAN DU workloads, cluster resources, and hardware specifications for the hub cluster and managed single-node OpenShift clusters.
3.2.2.1. Telco RAN DU application workloads
DU worker nodes must have 3rd Generation Xeon (Ice Lake) 2.20 GHz or better CPUs with firmware tuned for maximum performance.
5G RAN DU user applications and workloads should conform to the following best practices and application limits:
- Develop cloud-native network functions (CNFs) that conform to the latest version of the CNF best practices guide.
- Use SR-IOV for high performance networking.
Use exec probes sparingly and only when no other suitable options are available
-
Do not use exec probes if a CNF uses CPU pinning. Use other probe implementations, for example,
httpGet
ortcpSocket
. - When you need to use exec probes, limit the exec probe frequency and quantity. The maximum number of exec probes must be kept below 10, and frequency must not be set to less than 10 seconds.
-
Do not use exec probes if a CNF uses CPU pinning. Use other probe implementations, for example,
Startup probes require minimal resources during steady-state operation. The limitation on exec probes applies primarily to liveness and readiness probes.
3.2.2.2. Telco RAN DU representative reference application workload characteristics
The representative reference application workload has the following characteristics:
- Has a maximum of 15 pods and 30 containers for the vRAN application including its management and control functions
-
Uses a maximum of 2
ConfigMap
and 4Secret
CRs per pod - Uses a maximum of 10 exec probes with a frequency of not less than 10 seconds
Incremental application load on the
kube-apiserver
is less than 10% of the cluster platform usageNoteYou can extract CPU load can from the platform metrics. For example:
query=avg_over_time(pod:container_cpu_usage:sum{namespace="openshift-kube-apiserver"}[30m])
- Application logs are not collected by the platform log collector
- Aggregate traffic on the primary CNI is less than 1 MBps
3.2.2.3. Telco RAN DU worker node cluster resource utilization
The maximum number of running pods in the system, inclusive of application workloads and OpenShift Container Platform pods, is 120.
- Resource utilization
OpenShift Container Platform resource utilization varies depending on many factors including application workload characteristics such as:
- Pod count
- Type and frequency of probes
- Messaging rates on primary CNI or secondary CNI with kernel networking
- API access rate
- Logging rates
- Storage IOPS
Cluster resource requirements are applicable under the following conditions:
- The cluster is running the described representative application workload.
- The cluster is managed with the constraints described in "Telco RAN DU worker node cluster resource utilization".
- Components noted as optional in the RAN DU use model configuration are not applied.
You will need to do additional analysis to determine the impact on resource utilization and ability to meet KPI targets for configurations outside the scope of the Telco RAN DU reference design. You might have to allocate additional resources in the cluster depending on your requirements.
Additional resources
3.2.2.4. Hub cluster management characteristics
Red Hat Advanced Cluster Management (RHACM) is the recommended cluster management solution. Configure it to the following limits on the hub cluster:
- Configure a maximum of 5 RHACM policies with a compliant evaluation interval of at least 10 minutes.
- Use a maximum of 10 managed cluster templates in policies. Where possible, use hub-side templating.
Disable all RHACM add-ons except for the
policy-controller
andobservability-controller
add-ons. SetObservability
to the default configuration.ImportantConfiguring optional components or enabling additional features will result in additional resource usage and can reduce overall system performance.
For more information, see Reference design deployment components.
Metric | Limit | Notes |
---|---|---|
CPU usage | Less than 4000 mc – 2 cores (4 hyperthreads) | Platform CPU is pinned to reserved cores, including both hyperthreads in each reserved core. The system is engineered to use 3 CPUs (3000mc) at steady-state to allow for periodic system tasks and spikes. |
Memory used | Less than 16G |
3.2.2.5. Telco RAN DU RDS components
The following sections describe the various OpenShift Container Platform components and configurations that you use to configure and deploy clusters to run telco RAN DU workloads.
Figure 3.2. Telco RAN DU reference design components
Ensure that components that are not included in the telco RAN DU profile do not affect the CPU resources allocated to workload applications.
Out of tree drivers are not supported.
Additional resources
- For details of the telco RAN RDS KPI test results, see Telco RAN DU reference design specification KPI test results. This information is only available to customers and partners.
3.2.3. Telco RAN DU 4.15 reference design components
The following sections describe the various OpenShift Container Platform components and configurations that you use to configure and deploy clusters to run RAN DU workloads.
3.2.3.1. Host firmware tuning
- New in this release
- No reference design updates in this release
- Description
Configure system level performance. See Configuring host firmware for low latency and high performance for recommended settings.
If Ironic inspection is enabled, the firmware setting values are available from the per-cluster
BareMetalHost
CR on the hub cluster. You enable Ironic inspection with a label in thespec.clusters.nodes
field in theSiteConfig
CR that you use to install the cluster. For example:nodes: - hostName: "example-node1.example.com" ironicInspect: "enabled"
NoteThe telco RAN DU reference
SiteConfig
does not enable theironicInspect
field by default.- Limits and requirements
- Hyperthreading must be enabled
- Engineering considerations
Tune all settings for maximum performance
NoteYou can tune firmware selections for power savings at the expense of performance as required.
3.2.3.2. Node Tuning Operator
- New in this release
- No reference design updates in this release
- Description
You tune the cluster performance by creating a performance profile. Settings that you configure with a performance profile include:
- Selecting the realtime or non-realtime kernel.
-
Allocating cores to a reserved or isolated
cpuset
. OpenShift Container Platform processes allocated to the management workload partition are pinned to reserved set. - Enabling kubelet features (CPU manager, topology manager, and memory manager).
- Configuring huge pages.
- Setting additional kernel arguments.
- Setting per-core power tuning and max CPU frequency.
- Limits and requirements
The Node Tuning Operator uses the
PerformanceProfile
CR to configure the cluster. You need to configure the following settings in the RAN DU profilePerformanceProfile
CR:- Select reserved and isolated cores and ensure that you allocate at least 4 hyperthreads (equivalent to 2 cores) on Intel 3rd Generation Xeon (Ice Lake) 2.20 GHz CPUs or better with firmware tuned for maximum performance.
-
Set the reserved
cpuset
to include both hyperthread siblings for each included core. Unreserved cores are available as allocatable CPU for scheduling workloads. Ensure that hyperthread siblings are not split across reserved and isolated cores. - Configure reserved and isolated CPUs to include all threads in all cores based on what you have set as reserved and isolated CPUs.
- Set core 0 of each NUMA node to be included in the reserved CPU set.
- Set the huge page size to 1G.
You should not add additional workloads to the management partition. Only those pods which are part of the OpenShift management platform should be annotated into the management partition.
- Engineering considerations
You should use the RT kernel to meet performance requirements.
NoteYou can use the non-RT kernel if required.
- The number of huge pages that you configure depends on the application workload requirements. Variation in this parameter is expected and allowed.
- Variation is expected in the configuration of reserved and isolated CPU sets based on selected hardware and additional components in use on the system. Variation must still meet the specified limits.
- Hardware without IRQ affinity support impacts isolated CPUs. To ensure that pods with guaranteed whole CPU QoS have full use of the allocated CPU, all hardware in the server must support IRQ affinity. For more information, see About support of IRQ affinity setting.
In OpenShift Container Platform 4.15, any PerformanceProfile
CR configured on the cluster causes the Node Tuning Operator to automatically set all cluster nodes to use cgroup v1.
For more information about cgroups, see Configuring Linux cgroup.
3.2.3.3. PTP Operator
- New in this release
- No reference design updates in this release
- Description
See PTP timing for details of support and configuration of PTP in cluster nodes. The DU node can run in the following modes:
- As an ordinary clock (OC) synced to a grandmaster clock or boundary clock (T-BC)
- As a grandmaster clock synced from GPS with support for single or dual card E810 Westport Channel NICs
As dual boundary clocks (one per NIC) with support for E810 Westport Channel NICs
NoteHighly available boundary clocks are not supported.
- Optional: as a boundary clock for radio units (RUs)
Events and metrics for grandmaster clocks are a Tech Preview feature added in the 4.14 telco RAN DU RDS. For more information see Using the PTP hardware fast event notifications framework.
You can subscribe applications to PTP events that happen on the node where the DU application is running.
- Limits and requirements
- High availability is not supported with dual NIC configurations.
- Digital Phase-Locked Loop (DPLL) clock synchronization is not supported for E810 Westport Channel NICs.
- GPS offsets are not reported. Use a default offset of less than or equal to 5.
- DPLL offsets are not reported. Use a default offset of less than or equal to 5.
- Engineering considerations
- Configurations are provided for ordinary clock, boundary clock, or grandmaster clock
-
PTP fast event notifications uses
ConfigMap
CRs to store PTP event subscriptions - Use Intel E810-XXV-4T Westport Channel NICs for PTP grandmaster clocks with GPS timing, minimum firmware version 4.40
3.2.3.4. SR-IOV Operator
- New in this release
- No reference design updates in this release
- Description
-
The SR-IOV Operator provisions and configures the SR-IOV CNI and device plugins. Both
netdevice
(kernel VFs) andvfio
(DPDK) devices are supported. - Engineering considerations
-
Customer variation on the configuration and number of
SriovNetwork
andSriovNetworkNodePolicy
custom resources (CRs) is expected. -
IOMMU kernel command line settings are applied with a
MachineConfig
CR at install time. This ensures that theSriovOperator
CR does not cause a reboot of the node when adding them.
-
Customer variation on the configuration and number of
3.2.3.5. Logging
- New in this release
- No reference design updates in this release
- Description
- Use logging to collect logs from the far edge node for remote analysis. The recommended log collector is Vector.
- Engineering considerations
- Handling logs beyond the infrastructure and audit logs, for example, from the application workload requires additional CPU and network bandwidth based on additional logging rate.
As of OpenShift Container Platform 4.14, Vector is the reference log collector.
NoteUse of fluentd in the RAN use model is deprecated.
3.2.3.6. SRIOV-FEC Operator
- New in this release
- No reference design updates in this release
- Description
- SRIOV-FEC Operator is an optional 3rd party Certified Operator supporting FEC accelerator hardware.
- Limits and requirements
Starting with FEC Operator v2.7.0:
-
SecureBoot
is supported -
The
vfio
driver for thePF
requires the usage ofvfio-token
that is injected into Pods. TheVF
token can be passed to DPDK by using the EAL parameter--vfio-vf-token
.
-
- Engineering considerations
-
The SRIOV-FEC Operator uses CPU cores from the
isolated
CPU set. - You can validate FEC readiness as part of the pre-checks for application deployment, for example, by extending the validation policy.
-
The SRIOV-FEC Operator uses CPU cores from the
3.2.3.7. Local Storage Operator
- New in this release
- No reference design updates in this release
- Description
-
You can create persistent volumes that can be used as
PVC
resources by applications with the Local Storage Operator. The number and type ofPV
resources that you create depends on your requirements. - Engineering considerations
-
Create backing storage for
PV
CRs before creating thePV
. This can be a partition, a local volume, LVM volume, or full disk. Refer to the device listing in
LocalVolume
CRs by the hardware path used to access each device to ensure correct allocation of disks and partitions. Logical names (for example,/dev/sda
) are not guaranteed to be consistent across node reboots.For more information, see the RHEL 9 documentation on device identifiers.
-
Create backing storage for
3.2.3.8. LVMS Operator
- New in this release
- No reference design updates in this release
- New in this release
-
Simplified LVMS
deviceSelector
logic -
LVM Storage with
ext4
andPV
resources
-
Simplified LVMS
LVMS Operator is an optional component.
- Description
The LVMS Operator provides dynamic provisioning of block and file storage. The LVMS Operator creates logical volumes from local devices that can be used as
PVC
resources by applications. Volume expansion and snapshots are also possible.The following example configuration creates a
vg1
volume group that leverages all available disks on the node except the installation disk:StorageLVMCluster.yaml
apiVersion: lvm.topolvm.io/v1alpha1 kind: LVMCluster metadata: name: storage-lvmcluster namespace: openshift-storage annotations: ran.openshift.io/ztp-deploy-wave: "10" spec: storage: deviceClasses: - name: vg1 thinPoolConfig: name: thin-pool-1 sizePercent: 90 overprovisionRatio: 10
- Limits and requirements
- Ceph is excluded when used on cluster topologies with fewer than 3 nodes. For example, Ceph is excluded in a single-node OpenShift cluster or single-node OpenShift cluster with a single worker node.
- In single-node OpenShift clusters, persistent storage must be provided by either LVMS or local storage, not both.
- Engineering considerations
- The LVMS Operator is not the reference storage solution for the DU use case. If you require LVMS Operator for application workloads, the resource use is accounted for against the application cores.
- Ensure that sufficient disks or partitions are available for storage requirements.
3.2.3.9. Workload partitioning
- New in this release
- No reference design updates in this release
- Description
Workload partitioning pins OpenShift platform and Day 2 Operator pods that are part of the DU profile to the reserved
cpuset
and removes the reserved CPU from node accounting. This leaves all unreserved CPU cores available for user workloads.The method of enabling and configuring workload partitioning changed in OpenShift Container Platform 4.14.
- 4.14 and later
Configure partitions by setting installation parameters:
cpuPartitioningMode: AllNodes
-
Configure management partition cores with the reserved CPU set in the
PerformanceProfile
CR
- 4.13 and earlier
-
Configure partitions with extra
MachineConfiguration
CRs applied at install-time
-
Configure partitions with extra
- Limits and requirements
-
Namespace
andPod
CRs must be annotated to allow the pod to be applied to the management partition - Pods with CPU limits cannot be allocated to the partition. This is because mutation can change the pod QoS.
- For more information about the minimum number of CPUs that can be allocated to the management partition, see Node Tuning Operator.
-
- Engineering considerations
- Workload Partitioning pins all management pods to reserved cores. A sufficient number of cores must be allocated to the reserved set to account for operating system, management pods, and expected spikes in CPU use that occur when the workload starts, the node reboots, or other system events happen.
3.2.3.10. Cluster tuning
- New in this release
- No reference design updates in this release
- Description
The cluster capabilities feature includes a
MachineAPI
component which, when excluded, disables the following Operators and their resources in the cluster:-
openshift/cluster-autoscaler-operator
-
openshift/cluster-control-plane-machine-set-operator
-
openshift/machine-api-operator
-
Use cluster capabilities to remove the Image Registry Operator.
- Limits and requirements
- Cluster capabilities are not available for installer-provisioned installation methods.
You must apply all platform tuning configurations. The following table lists the required platform tuning configurations:
Table 3.2. Cluster capabilities configurations Feature Description Remove optional cluster capabilities
Reduce the OpenShift Container Platform footprint by disabling optional cluster Operators on single-node OpenShift clusters only.
- Remove all optional Operators except the Marketplace and Node Tuning Operators.
Configure cluster monitoring
Configure the monitoring stack for reduced footprint by doing the following:
-
Disable the local
alertmanager
andtelemeter
components. -
If you use RHACM observability, the CR must be augmented with appropriate
additionalAlertManagerConfigs
CRs to forward alerts to the hub cluster. Reduce the
Prometheus
retention period to 24h.NoteThe RHACM hub cluster aggregates managed cluster metrics.
Disable networking diagnostics
Disable networking diagnostics for single-node OpenShift because they are not required.
Configure a single OperatorHub catalog source
Configure the cluster to use a single catalog source that contains only the Operators required for a RAN DU deployment. Each catalog source increases the CPU use on the cluster. Using a single
CatalogSource
fits within the platform CPU budget.
3.2.3.11. Machine configuration
- New in this release
- No reference design updates in this release
- Limits and requirements
The CRI-O wipe disable
MachineConfig
assumes that images on disk are static other than during scheduled maintenance in defined maintenance windows. To ensure the images are static, do not set the podimagePullPolicy
field toAlways
.Table 3.3. Machine configuration options Feature Description Container runtime
Sets the container runtime to
crun
for all node roles.kubelet config and container mount hiding
Reduces the frequency of kubelet housekeeping and eviction monitoring to reduce CPU usage. Create a container mount namespace, visible to kubelet and CRI-O, to reduce system mount scanning resource usage.
SCTP
Optional configuration (enabled by default) Enables SCTP. SCTP is required by RAN applications but disabled by default in RHCOS.
kdump
Optional configuration (enabled by default) Enables kdump to capture debug information when a kernel panic occurs.
CRI-O wipe disable
Disables automatic wiping of the CRI-O image cache after unclean shutdown.
SR-IOV-related kernel arguments
Includes additional SR-IOV related arguments in the kernel command line.
RCU Normal systemd service
Sets
rcu_normal
after the system is fully started.One-shot time sync
Runs a one-time system time synchronization job for control plane or worker nodes.
3.2.3.12. Reference design deployment components
The following sections describe the various OpenShift Container Platform components and configurations that you use to configure the hub cluster with Red Hat Advanced Cluster Management (RHACM).
3.2.3.12.1. Red Hat Advanced Cluster Management (RHACM)
- New in this release
- No reference design updates in this release
- Description
RHACM provides Multi Cluster Engine (MCE) installation and ongoing lifecycle management functionality for deployed clusters. You declaratively specify configurations and upgrades with
Policy
CRs and apply the policies to clusters with the RHACM policy controller as managed by Topology Aware Lifecycle Manager.- GitOps Zero Touch Provisioning (ZTP) uses the MCE feature of RHACM
- Configuration, upgrades, and cluster status are managed with the RHACM policy controller
During installation RHACM can apply labels to individual nodes as configured in the
SiteConfig
custom resource (CR).- Limits and requirements
-
A single hub cluster supports up to 3500 deployed single-node OpenShift clusters with 5
Policy
CRs bound to each cluster.
-
A single hub cluster supports up to 3500 deployed single-node OpenShift clusters with 5
- Engineering considerations
- Use RHACM policy hub-side templating to better scale cluster configuration. You can significantly reduce the number of policies by using a single group policy or small number of general group policies where the group and per-cluster values are substituted into templates.
-
Cluster specific configuration: managed clusters typically have some number of configuration values that are specific to the individual cluster. These configurations should be managed using RHACM policy hub-side templating with values pulled from
ConfigMap
CRs based on the cluster name. - To save CPU resources on managed clusters, policies that apply static configurations should be unbound from managed clusters after GitOps ZTP installation of the cluster. For more information, see Release a persistent volume.
3.2.3.12.2. Topology Aware Lifecycle Manager (TALM)
- New in this release
- No reference design updates in this release
- Description
- Managed updates
TALM is an Operator that runs only on the hub cluster for managing how changes (including cluster and Operator upgrades, configuration, and so on) are rolled out to the network. TALM does the following:
-
Progressively applies updates to fleets of clusters in user-configurable batches by using
Policy
CRs. -
Adds
ztp-done
labels or other user configurable labels on a per-cluster basis
-
Progressively applies updates to fleets of clusters in user-configurable batches by using
- Precaching for single-node OpenShift clusters
TALM supports optional precaching of OpenShift Container Platform, OLM Operator, and additional user images to single-node OpenShift clusters before initiating an upgrade.
A
PreCachingConfig
custom resource is available for specifying optional pre-caching configurations. For example:apiVersion: ran.openshift.io/v1alpha1 kind: PreCachingConfig metadata: name: example-config namespace: example-ns spec: additionalImages: - quay.io/foobar/application1@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e - quay.io/foobar/application2@sha256:3d5800123dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47adf - quay.io/foobar/applicationN@sha256:4fe1334adfafadsf987123adfffdaf1243340adfafdedga0991234afdadfs spaceRequired: 45 GiB 1 overrides: preCacheImage: quay.io/test_images/pre-cache:latest platformImage: quay.io/openshift-release-dev/ocp-release@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e operatorsIndexes: - registry.example.com:5000/custom-redhat-operators:1.0.0 operatorsPackagesAndChannels: - local-storage-operator: stable - ptp-operator: stable - sriov-network-operator: stable excludePrecachePatterns: 2 - aws - vsphere
- Backup and restore for single-node OpenShift
- TALM supports taking a snapshot of the cluster operating system and configuration to a dedicated partition on a local disk. A restore script is provided that returns the cluster to the backed up state.
- Limits and requirements
- TALM supports concurrent cluster deployment in batches of 400
- Precaching and backup features are for single-node OpenShift clusters only.
- Engineering considerations
-
The
PreCachingConfig
CR is optional and does not need to be created if you just wants to precache platform related (OpenShift and OLM Operator) images. ThePreCachingConfig
CR must be applied before referencing it in theClusterGroupUpgrade
CR. - Create a recovery partition during installation if you opt to use the TALM backup and restore feature.
-
The
3.2.3.12.3. GitOps and GitOps ZTP plugins
- New in this release
- No reference design updates in this release
- Description
GitOps and GitOps ZTP plugins provide a GitOps-based infrastructure for managing cluster deployment and configuration. Cluster definitions and configurations are maintained as a declarative state in Git. ZTP plugins provide support for generating installation CRs from the
SiteConfig
CR and automatic wrapping of configuration CRs in policies based onPolicyGenTemplate
CRs.You can deploy and manage multiple versions of OpenShift Container Platform on managed clusters using the baseline reference configuration CRs. You can also use custom CRs alongside the baseline CRs.
- Limits
-
300
SiteConfig
CRs per ArgoCD application. You can use multiple applications to achieve the maximum number of clusters supported by a single hub cluster. -
Content in the
/source-crs
folder in Git overrides content provided in the GitOps ZTP plugin container. Git takes precedence in the search path. Add the
/source-crs
folder in the same directory as thekustomization.yaml
file, which includes thePolicyGenTemplate
as a generator.NoteAlternative locations for the
/source-crs
directory are not supported in this context.
-
300
- Engineering considerations
-
To avoid confusion or unintentional overwriting of files when updating content, use unique and distinguishable names for user-provided CRs in the
/source-crs
folder and extra manifests in Git. -
The
SiteConfig
CR allows multiple extra-manifest paths. When files with the same name are found in multiple directory paths, the last file found takes precedence. This allows you to put the full set of version-specific Day 0 manifests (extra-manifests) in Git and reference them from theSiteConfig
CR. With this feature, you can deploy multiple OpenShift Container Platform versions to managed clusters simultaneously. -
The
extraManifestPath
field of theSiteConfig
CR is deprecated from OpenShift Container Platform 4.15 and later. Use the newextraManifests.searchPaths
field instead.
-
To avoid confusion or unintentional overwriting of files when updating content, use unique and distinguishable names for user-provided CRs in the
3.2.3.12.4. Agent-based installer
- New in this release
- No reference design updates in this release
- Description
Agent-based installer (ABI) provides installation capabilities without centralized infrastructure. The installation program creates an ISO image that you mount to the server. When the server boots it installs OpenShift Container Platform and supplied extra manifests.
NoteYou can also use ABI to install OpenShift Container Platform clusters without a hub cluster. An image registry is still required when you use ABI in this manner.
Agent-based installer (ABI) is an optional component.
- Limits and requirements
- You can supply a limited set of additional manifests at installation time.
-
You must include
MachineConfiguration
CRs that are required by the RAN DU use case.
- Engineering considerations
- ABI provides a baseline OpenShift Container Platform installation.
- You install Day 2 Operators and the remainder of the RAN DU use case configurations after installation.
3.2.3.13. Additional components
3.2.3.13.1. Bare Metal Event Relay
The Bare Metal Event Relay is an optional Operator that runs exclusively on the managed spoke cluster. It relays Redfish hardware events to cluster applications.
The Bare Metal Event Relay is not included in the RAN DU use model reference configuration and is an optional feature. If you want to use the Bare Metal Event Relay, assign additional CPU resources from the application CPU budget.
3.2.4. Telco RAN distributed unit (DU) reference configuration CRs
Use the following custom resources (CRs) to configure and deploy OpenShift Container Platform clusters with the telco RAN DU profile. Some of the CRs are optional depending on your requirements. CR fields you can change are annotated in the CR with YAML comments.
You can extract the complete set of RAN DU CRs from the ztp-site-generate
container image. See Preparing the GitOps ZTP site configuration repository for more information.
3.2.4.1. Day 2 Operators reference CRs
Component | Reference CR | Optional | New in this release |
---|---|---|---|
Cluster logging | No | No | |
Cluster logging | No | No | |
Cluster logging | No | No | |
Cluster logging | No | No | |
Cluster logging | No | No | |
Local Storage Operator | Yes | No | |
Local Storage Operator | Yes | No | |
Local Storage Operator | Yes | No | |
Local Storage Operator | Yes | No | |
Local Storage Operator | Yes | No | |
Node Tuning Operator | No | No | |
Node Tuning Operator | No | No | |
PTP fast event notifications | Yes | No | |
PTP Operator | No | No | |
PTP Operator | No | No | |
PTP Operator | No | No | |
PTP Operator | No | No | |
PTP Operator | No | No | |
PTP Operator | No | No | |
PTP Operator | No | No | |
SR-IOV FEC Operator | Yes | No | |
SR-IOV FEC Operator | Yes | No | |
SR-IOV FEC Operator | Yes | No | |
SR-IOV FEC Operator | Yes | No | |
SR-IOV Operator | No | No | |
SR-IOV Operator | No | No | |
SR-IOV Operator | No | No | |
SR-IOV Operator | No | No | |
SR-IOV Operator | No | No | |
SR-IOV Operator | No | No |
3.2.4.2. Cluster tuning reference CRs
Component | Reference CR | Optional | New in this release |
---|---|---|---|
Cluster capabilities | No | No | |
Disabling network diagnostics | No | No | |
Disconnected Registry | No | Yes | |
Monitoring configuration | No | No | |
OperatorHub | No | No | |
OperatorHub | No | No | |
OperatorHub | No | No | |
OperatorHub | Yes | No |
3.2.4.3. Machine configuration reference CRs
Component | Reference CR | Optional | New in this release |
---|---|---|---|
Container runtime (crun) | No | No | |
Container runtime (crun) | No | No | |
Disabling CRI-O wipe | No | No | |
Disabling CRI-O wipe | No | No | |
Enable cgroup v1 | No | No | |
Enabling kdump | No | No | |
Enabling kdump | No | No | |
Enabling kdump | No | No | |
Enabling kdump | No | No | |
kubelet configuration and container mount hiding | No | No | |
kubelet configuration and container mount hiding | No | No | |
One-shot time sync | No | No | |
One-shot time sync | No | No | |
SCTP | No | No | |
SCTP | No | No | |
Set RCU Normal | No | No | |
Set RCU Normal | No | No | |
SR-IOV related kernel arguments | No | No | |
SR-IOV related kernel arguments | No | No |
3.2.4.4. YAML reference
The following is a complete reference for all the custom resources (CRs) that make up the telco RAN DU 4.15 reference configuration.
3.2.4.4.1. Day 2 Operators reference YAML
ClusterLogForwarder.yaml
apiVersion: "logging.openshift.io/v1" kind: ClusterLogForwarder metadata: name: instance namespace: openshift-logging annotations: {} spec: outputs: $outputs pipelines: $pipelines
ClusterLogging.yaml
apiVersion: logging.openshift.io/v1 kind: ClusterLogging metadata: name: instance namespace: openshift-logging annotations: {} spec: managementState: "Managed" collection: logs: type: "vector"
ClusterLogNS.yaml
--- apiVersion: v1 kind: Namespace metadata: name: openshift-logging annotations: workload.openshift.io/allowed: management
ClusterLogOperGroup.yaml
--- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: cluster-logging namespace: openshift-logging annotations: {} spec: targetNamespaces: - openshift-logging
ClusterLogSubscription.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: cluster-logging namespace: openshift-logging annotations: {} spec: channel: "stable" name: cluster-logging source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
StorageClass.yaml
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: {} name: example-storage-class provisioner: kubernetes.io/no-provisioner reclaimPolicy: Delete
StorageLV.yaml
apiVersion: "local.storage.openshift.io/v1" kind: "LocalVolume" metadata: name: "local-disks" namespace: "openshift-local-storage" annotations: {} spec: logLevel: Normal managementState: Managed storageClassDevices: # The list of storage classes and associated devicePaths need to be specified like this example: - storageClassName: "example-storage-class" volumeMode: Filesystem fsType: xfs # The below must be adjusted to the hardware. # For stability and reliability, it's recommended to use persistent # naming conventions for devicePaths, such as /dev/disk/by-path. devicePaths: - /dev/disk/by-path/pci-0000:05:00.0-nvme-1 #--- ## How to verify ## 1. Create a PVC # apiVersion: v1 # kind: PersistentVolumeClaim # metadata: # name: local-pvc-name # spec: # accessModes: # - ReadWriteOnce # volumeMode: Filesystem # resources: # requests: # storage: 100Gi # storageClassName: example-storage-class #--- ## 2. Create a pod that mounts it # apiVersion: v1 # kind: Pod # metadata: # labels: # run: busybox # name: busybox # spec: # containers: # - image: quay.io/quay/busybox:latest # name: busybox # resources: {} # command: ["/bin/sh", "-c", "sleep infinity"] # volumeMounts: # - name: local-pvc # mountPath: /data # volumes: # - name: local-pvc # persistentVolumeClaim: # claimName: local-pvc-name # dnsPolicy: ClusterFirst # restartPolicy: Always ## 3. Run the pod on the cluster and verify the size and access of the `/data` mount
StorageNS.yaml
apiVersion: v1 kind: Namespace metadata: name: openshift-local-storage annotations: workload.openshift.io/allowed: management
StorageOperGroup.yaml
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: openshift-local-storage namespace: openshift-local-storage annotations: {} spec: targetNamespaces: - openshift-local-storage
StorageSubscription.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: local-storage-operator namespace: openshift-local-storage annotations: {} spec: channel: "stable" name: local-storage-operator source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
PerformanceProfile.yaml
apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: # if you change this name make sure the 'include' line in TunedPerformancePatch.yaml # matches this name: include=openshift-node-performance-${PerformanceProfile.metadata.name} # Also in file 'validatorCRs/informDuValidator.yaml': # name: 50-performance-${PerformanceProfile.metadata.name} name: openshift-node-performance-profile annotations: ran.openshift.io/reference-configuration: "ran-du.redhat.com" spec: additionalKernelArgs: - "rcupdate.rcu_normal_after_boot=0" - "efi=runtime" - "vfio_pci.enable_sriov=1" - "vfio_pci.disable_idle_d3=1" - "module_blacklist=irdma" cpu: isolated: $isolated reserved: $reserved hugepages: defaultHugepagesSize: $defaultHugepagesSize pages: - size: $size count: $count node: $node machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/$mcp: "" nodeSelector: node-role.kubernetes.io/$mcp: '' numa: topologyPolicy: "restricted" # To use the standard (non-realtime) kernel, set enabled to false realTimeKernel: enabled: true workloadHints: # WorkloadHints defines the set of upper level flags for different type of workloads. # See https://github.com/openshift/cluster-node-tuning-operator/blob/master/docs/performanceprofile/performance_profile.md#workloadhints # for detailed descriptions of each item. # The configuration below is set for a low latency, performance mode. realTime: true highPowerConsumption: false perPodPowerManagement: false
TunedPerformancePatch.yaml
apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: performance-patch namespace: openshift-cluster-node-tuning-operator annotations: {} spec: profile: - name: performance-patch # Please note: # - The 'include' line must match the associated PerformanceProfile name, following below pattern # include=openshift-node-performance-${PerformanceProfile.metadata.name} # - When using the standard (non-realtime) kernel, remove the kernel.timer_migration override from # the [sysctl] section and remove the entire section if it is empty. data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-openshift-node-performance-profile [sysctl] kernel.timer_migration=1 [scheduler] group.ice-ptp=0:f:10:*:ice-ptp.* group.ice-gnss=0:f:10:*:ice-gnss.* [service] service.stalld=start,enable service.chronyd=stop,disable recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "$mcp" priority: 19 profile: performance-patch
PtpOperatorConfigForEvent.yaml
apiVersion: ptp.openshift.io/v1 kind: PtpOperatorConfig metadata: name: default namespace: openshift-ptp annotations: {} spec: daemonNodeSelector: node-role.kubernetes.io/$mcp: "" ptpEventConfig: enableEventPublisher: true transportHost: "http://ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043"
PtpConfigBoundary.yaml
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: boundary namespace: openshift-ptp annotations: {} spec: profile: - name: "boundary" ptp4lOpts: "-2" phc2sysOpts: "-a -r -n 24" ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" ptp4lConf: | # The interface name is hardware-specific [$iface_slave] masterOnly 0 [$iface_master_1] masterOnly 1 [$iface_master_2] masterOnly 1 [$iface_master_3] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 slaveOnly 0 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 248 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 135 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: "boundary" priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
PtpConfigDualCardGmWpc.yaml
# In this example 2 cards $iface_master and $iface_master_1 are connected via SMA1 ports by a cable # and $iface_master_1 receives 1PPS signals from $iface_master apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: grandmaster namespace: openshift-ptp annotations: ran.openshift.io/ztp-deploy-wave: "10" spec: profile: - name: "grandmaster" ptp4lOpts: "-2 --summary_interval -4" phc2sysOpts: -r -u 0 -m -w -N 8 -R 16 -s $iface_master -n 24 ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" plugins: e810: enableDefaultConfig: false settings: LocalMaxHoldoverOffSet: 1500 LocalHoldoverTimeout: 14400 MaxInSpecOffset: 100 pins: $e810_pins # "$iface_master": # "U.FL2": "0 2" # "U.FL1": "0 1" # "SMA2": "0 2" # "SMA1": "2 1" # "$iface_master_1": # "U.FL2": "0 2" # "U.FL1": "0 1" # "SMA2": "0 2" # "SMA1": "1 1" ublxCmds: - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1 - "-P" - "29.20" - "-z" - "CFG-HW-ANT_CFG_VOLTCTRL,1" reportOutput: false - args: #ubxtool -P 29.20 -e GPS - "-P" - "29.20" - "-e" - "GPS" reportOutput: false - args: #ubxtool -P 29.20 -d Galileo - "-P" - "29.20" - "-d" - "Galileo" reportOutput: false - args: #ubxtool -P 29.20 -d GLONASS - "-P" - "29.20" - "-d" - "GLONASS" reportOutput: false - args: #ubxtool -P 29.20 -d BeiDou - "-P" - "29.20" - "-d" - "BeiDou" reportOutput: false - args: #ubxtool -P 29.20 -d SBAS - "-P" - "29.20" - "-d" - "SBAS" reportOutput: false - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000 - "-P" - "29.20" - "-t" - "-w" - "5" - "-v" - "1" - "-e" - "SURVEYIN,600,50000" reportOutput: true - args: #ubxtool -P 29.20 -p MON-HW - "-P" - "29.20" - "-p" - "MON-HW" reportOutput: true - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248 - "-P" - "29.20" - "-p" - "CFG-MSG,1,38,248" reportOutput: true ts2phcOpts: " " ts2phcConf: | [nmea] ts2phc.master 1 [global] use_syslog 0 verbose 1 logging_level 7 ts2phc.pulsewidth 100000000 #cat /dev/GNSS to find available serial port #example value of gnss_serialport is /dev/ttyGNSS_1700_0 ts2phc.nmea_serialport $gnss_serialport leapfile /usr/share/zoneinfo/leap-seconds.list [$iface_master] ts2phc.extts_polarity rising ts2phc.extts_correction 0 [$iface_master_1] ts2phc.extts_polarity rising #this is a measured value in nanoseconds to compensate for SMA cable delay ts2phc.extts_correction -10 ptp4lConf: | [$iface_master] masterOnly 1 [$iface_master_1] masterOnly 1 [$iface_master_1_1] masterOnly 1 [$iface_master_1_2] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 6 clockAccuracy 0x27 offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval 0 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval -4 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 1 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0x20 recommend: - profile: "grandmaster" priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
PtpConfigGmWpc.yaml
# The grandmaster profile is provided for testing only # It is not installed on production clusters apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: grandmaster namespace: openshift-ptp annotations: {} spec: profile: - name: "grandmaster" ptp4lOpts: "-2 --summary_interval -4" phc2sysOpts: -r -u 0 -m -O -37 -N 8 -R 16 -s $iface_master -n 24 ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" plugins: e810: enableDefaultConfig: false settings: LocalMaxHoldoverOffSet: 1500 LocalHoldoverTimeout: 14400 MaxInSpecOffset: 100 pins: $e810_pins # "$iface_master": # "U.FL2": "0 2" # "U.FL1": "0 1" # "SMA2": "0 2" # "SMA1": "0 1" ublxCmds: - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1 - "-P" - "29.20" - "-z" - "CFG-HW-ANT_CFG_VOLTCTRL,1" reportOutput: false - args: #ubxtool -P 29.20 -e GPS - "-P" - "29.20" - "-e" - "GPS" reportOutput: false - args: #ubxtool -P 29.20 -d Galileo - "-P" - "29.20" - "-d" - "Galileo" reportOutput: false - args: #ubxtool -P 29.20 -d GLONASS - "-P" - "29.20" - "-d" - "GLONASS" reportOutput: false - args: #ubxtool -P 29.20 -d BeiDou - "-P" - "29.20" - "-d" - "BeiDou" reportOutput: false - args: #ubxtool -P 29.20 -d SBAS - "-P" - "29.20" - "-d" - "SBAS" reportOutput: false - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000 - "-P" - "29.20" - "-t" - "-w" - "5" - "-v" - "1" - "-e" - "SURVEYIN,600,50000" reportOutput: true - args: #ubxtool -P 29.20 -p MON-HW - "-P" - "29.20" - "-p" - "MON-HW" reportOutput: true - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,300 - "-P" - "29.20" - "-p" - "CFG-MSG,1,38,300" reportOutput: true ts2phcOpts: " " ts2phcConf: | [nmea] ts2phc.master 1 [global] use_syslog 0 verbose 1 logging_level 7 ts2phc.pulsewidth 100000000 #cat /dev/GNSS to find available serial port #example value of gnss_serialport is /dev/ttyGNSS_1700_0 ts2phc.nmea_serialport $gnss_serialport leapfile /usr/share/zoneinfo/leap-seconds.list [$iface_master] ts2phc.extts_polarity rising ts2phc.extts_correction 0 ptp4lConf: | [$iface_master] masterOnly 1 [$iface_master_1] masterOnly 1 [$iface_master_2] masterOnly 1 [$iface_master_3] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 6 clockAccuracy 0x27 offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval 0 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval -4 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0x20 recommend: - profile: "grandmaster" priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
PtpConfigSlave.yaml
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: slave namespace: openshift-ptp annotations: {} spec: profile: - name: "slave" # The interface name is hardware-specific interface: $interface ptp4lOpts: "-2 -s" phc2sysOpts: "-a -r -n 24" ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" ptp4lConf: | [global] # # Default Data Set # twoStepFlag 1 slaveOnly 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 255 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type OC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: "slave" priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
PtpSubscription.yaml
--- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ptp-operator-subscription namespace: openshift-ptp annotations: {} spec: channel: "stable" name: ptp-operator source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
PtpSubscriptionNS.yaml
--- apiVersion: v1 kind: Namespace metadata: name: openshift-ptp annotations: workload.openshift.io/allowed: management labels: openshift.io/cluster-monitoring: "true"
PtpSubscriptionOperGroup.yaml
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: ptp-operators namespace: openshift-ptp annotations: {} spec: targetNamespaces: - openshift-ptp
AcceleratorsNS.yaml
apiVersion: v1 kind: Namespace metadata: name: vran-acceleration-operators annotations: {}
AcceleratorsOperGroup.yaml
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: vran-operators namespace: vran-acceleration-operators annotations: {} spec: targetNamespaces: - vran-acceleration-operators
AcceleratorsSubscription.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-fec-subscription namespace: vran-acceleration-operators annotations: {} spec: channel: stable name: sriov-fec source: certified-operators sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
SriovFecClusterConfig.yaml
apiVersion: sriovfec.intel.com/v2 kind: SriovFecClusterConfig metadata: name: config namespace: vran-acceleration-operators annotations: {} spec: drainSkip: $drainSkip # true if SNO, false by default priority: 1 nodeSelector: node-role.kubernetes.io/master: "" acceleratorSelector: pciAddress: $pciAddress physicalFunction: pfDriver: "vfio-pci" vfDriver: "vfio-pci" vfAmount: 16 bbDevConfig: $bbDevConfig #Recommended configuration for Intel ACC100 (Mount Bryce) FPGA here: https://github.com/smart-edge-open/openshift-operator/blob/main/spec/openshift-sriov-fec-operator.md#sample-cr-for-wireless-fec-acc100 #Recommended configuration for Intel N3000 FPGA here: https://github.com/smart-edge-open/openshift-operator/blob/main/spec/openshift-sriov-fec-operator.md#sample-cr-for-wireless-fec-n3000
SriovNetwork.yaml
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: "" namespace: openshift-sriov-network-operator annotations: {} spec: # resourceName: "" networkNamespace: openshift-sriov-network-operator # vlan: "" # spoofChk: "" # ipam: "" # linkState: "" # maxTxRate: "" # minTxRate: "" # vlanQoS: "" # trust: "" # capabilities: ""
SriovNetworkNodePolicy.yaml
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: $name namespace: openshift-sriov-network-operator annotations: {} spec: # The attributes for Mellanox/Intel based NICs as below. # deviceType: netdevice/vfio-pci # isRdma: true/false deviceType: $deviceType isRdma: $isRdma nicSelector: # The exact physical function name must match the hardware used pfNames: [$pfNames] nodeSelector: node-role.kubernetes.io/$mcp: "" numVfs: $numVfs priority: $priority resourceName: $resourceName
SriovOperatorConfig.yaml
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator annotations: {} spec: configDaemonNodeSelector: "node-role.kubernetes.io/$mcp": "" # Injector and OperatorWebhook pods can be disabled (set to "false") below # to reduce the number of management pods. It is recommended to start with the # webhook and injector pods enabled, and only disable them after verifying the # correctness of user manifests. # If the injector is disabled, containers using sr-iov resources must explicitly assign # them in the "requests"/"limits" section of the container spec, for example: # containers: # - name: my-sriov-workload-container # resources: # limits: # openshift.io/<resource_name>: "1" # requests: # openshift.io/<resource_name>: "1" enableInjector: true enableOperatorWebhook: true logLevel: 0
SriovSubscription.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subscription namespace: openshift-sriov-network-operator annotations: {} spec: channel: "stable" name: sriov-network-operator source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
SriovSubscriptionNS.yaml
apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator annotations: workload.openshift.io/allowed: management
SriovSubscriptionOperGroup.yaml
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator annotations: {} spec: targetNamespaces: - openshift-sriov-network-operator
3.2.4.4.2. Cluster tuning reference YAML
example-sno.yaml
# example-node1-bmh-secret & assisted-deployment-pull-secret need to be created under same namespace example-sno --- apiVersion: ran.openshift.io/v1 kind: SiteConfig metadata: name: "example-sno" namespace: "example-sno" spec: baseDomain: "example.com" pullSecretRef: name: "assisted-deployment-pull-secret" clusterImageSetNameRef: "openshift-4.10" sshPublicKey: "ssh-rsa AAAA..." clusters: - clusterName: "example-sno" networkType: "OVNKubernetes" # installConfigOverrides is a generic way of passing install-config # parameters through the siteConfig. The 'capabilities' field configures # the composable openshift feature. In this 'capabilities' setting, we # remove all but the marketplace component from the optional set of # components. # Notes: # - OperatorLifecycleManager is needed for 4.15 and later # - NodeTuning is needed for 4.13 and later, not for 4.12 and earlier installConfigOverrides: | { "capabilities": { "baselineCapabilitySet": "None", "additionalEnabledCapabilities": [ "NodeTuning", "OperatorLifecycleManager" ] } } # It is strongly recommended to include crun manifests as part of the additional install-time manifests for 4.13+. # The crun manifests can be obtained from source-crs/optional-extra-manifest/ and added to the git repo ie.sno-extra-manifest. # extraManifestPath: sno-extra-manifest clusterLabels: # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples du-profile: "latest" # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples in ../policygentemplates: # ../policygentemplates/common-ranGen.yaml will apply to all clusters with 'common: true' common: true # ../policygentemplates/group-du-sno-ranGen.yaml will apply to all clusters with 'group-du-sno: ""' group-du-sno: "" # ../policygentemplates/example-sno-site.yaml will apply to all clusters with 'sites: "example-sno"' # Normally this should match or contain the cluster name so it only applies to a single cluster sites : "example-sno" clusterNetwork: - cidr: 1001:1::/48 hostPrefix: 64 machineNetwork: - cidr: 1111:2222:3333:4444::/64 serviceNetwork: - 1001:2::/112 additionalNTPSources: - 1111:2222:3333:4444::2 # Initiates the cluster for workload partitioning. Setting specific reserved/isolated CPUSets is done via PolicyTemplate # please see Workload Partitioning Feature for a complete guide. cpuPartitioningMode: AllNodes # Optionally; This can be used to override the KlusterletAddonConfig that is created for this cluster: #crTemplates: # KlusterletAddonConfig: "KlusterletAddonConfigOverride.yaml" nodes: - hostName: "example-node1.example.com" role: "master" # Optionally; This can be used to configure desired BIOS setting on a host: #biosConfigRef: # filePath: "example-hw.profile" bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1" bmcCredentialsName: name: "example-node1-bmh-secret" bootMACAddress: "AA:BB:CC:DD:EE:11" # Use UEFISecureBoot to enable secure boot bootMode: "UEFI" rootDeviceHints: deviceName: "/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0" # disk partition at `/var/lib/containers` with ignitionConfigOverride. Some values must be updated. See DiskPartitionContainer.md for more details ignitionConfigOverride: | { "ignition": { "version": "3.2.0" }, "storage": { "disks": [ { "device": "/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0", "partitions": [ { "label": "var-lib-containers", "sizeMiB": 0, "startMiB": 250000 } ], "wipeTable": false } ], "filesystems": [ { "device": "/dev/disk/by-partlabel/var-lib-containers", "format": "xfs", "mountOptions": [ "defaults", "prjquota" ], "path": "/var/lib/containers", "wipeFilesystem": true } ] }, "systemd": { "units": [ { "contents": "# Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target", "enabled": true, "name": "var-lib-containers.mount" } ] } } nodeNetwork: interfaces: - name: eno1 macAddress: "AA:BB:CC:DD:EE:11" config: interfaces: - name: eno1 type: ethernet state: up ipv4: enabled: false ipv6: enabled: true address: # For SNO sites with static IP addresses, the node-specific, # API and Ingress IPs should all be the same and configured on # the interface - ip: 1111:2222:3333:4444::aaaa:1 prefix-length: 64 dns-resolver: config: search: - example.com server: - 1111:2222:3333:4444::2 routes: config: - destination: ::/0 next-hop-interface: eno1 next-hop-address: 1111:2222:3333:4444::1 table-id: 254
DisableSnoNetworkDiag.yaml
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster annotations: {} spec: disableNetworkDiagnostics: true
09-openshift-marketplace-ns.yaml
apiVersion: v1 kind: Namespace metadata: annotations: openshift.io/node-selector: "" workload.openshift.io/allowed: "management" labels: openshift.io/cluster-monitoring: "true" pod-security.kubernetes.io/enforce: baseline pod-security.kubernetes.io/enforce-version: v1.25 pod-security.kubernetes.io/audit: baseline pod-security.kubernetes.io/audit-version: v1.25 pod-security.kubernetes.io/warn: baseline pod-security.kubernetes.io/warn-version: v1.25 name: "openshift-marketplace"
ReduceMonitoringFootprint.yaml
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring annotations: {} data: config.yaml: | alertmanagerMain: enabled: false telemeterClient: enabled: false prometheusK8s: retention: 24h
DefaultCatsrc.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: default-cat-source namespace: openshift-marketplace annotations: target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}' spec: displayName: default-cat-source image: $imageUrl publisher: Red Hat sourceType: grpc updateStrategy: registryPoll: interval: 1h status: connectionState: lastObservedState: READY
DisableOLMPprof.yaml
apiVersion: v1 kind: ConfigMap metadata: name: collect-profiles-config namespace: openshift-operator-lifecycle-manager annotations: {} data: pprof-config.yaml: | disabled: True
DisconnectedICSP.yaml
apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: name: disconnected-internal-icsp annotations: {} spec: repositoryDigestMirrors: - $mirrors
OperatorHub.yaml
apiVersion: config.openshift.io/v1 kind: OperatorHub metadata: name: cluster annotations: {} spec: disableAllDefaultSources: true
3.2.4.4.3. Machine configuration reference YAML
enable-crun-master.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: ContainerRuntimeConfig metadata: name: enable-crun-master spec: machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/master: "" containerRuntimeConfig: defaultRuntime: crun
enable-crun-worker.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: ContainerRuntimeConfig metadata: name: enable-crun-worker spec: machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/worker: "" containerRuntimeConfig: defaultRuntime: crun
99-crio-disable-wipe-master.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 99-crio-disable-wipe-master spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo= mode: 420 path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml
99-crio-disable-wipe-worker.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 99-crio-disable-wipe-worker spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo= mode: 420 path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml
enable-cgroups-v1.yaml
apiVersion: config.openshift.io/v1 kind: Node metadata: name: cluster spec: cgroupMode: "v1"
05-kdump-config-master.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 05-kdump-config-master spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: kdump-remove-ice-module.service contents: | [Unit] Description=Remove ice module when doing kdump Before=kdump.service [Service] Type=oneshot RemainAfterExit=true ExecStart=/usr/local/bin/kdump-remove-ice-module.sh [Install] WantedBy=multi-user.target storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKIyBUaGlzIHNjcmlwdCByZW1vdmVzIHRoZSBpY2UgbW9kdWxlIGZyb20ga2R1bXAgdG8gcHJldmVudCBrZHVtcCBmYWlsdXJlcyBvbiBjZXJ0YWluIHNlcnZlcnMuCiMgVGhpcyBpcyBhIHRlbXBvcmFyeSB3b3JrYXJvdW5kIGZvciBSSEVMUExBTi0xMzgyMzYgYW5kIGNhbiBiZSByZW1vdmVkIHdoZW4gdGhhdCBpc3N1ZSBpcwojIGZpeGVkLgoKc2V0IC14CgpTRUQ9Ii91c3IvYmluL3NlZCIKR1JFUD0iL3Vzci9iaW4vZ3JlcCIKCiMgb3ZlcnJpZGUgZm9yIHRlc3RpbmcgcHVycG9zZXMKS0RVTVBfQ09ORj0iJHsxOi0vZXRjL3N5c2NvbmZpZy9rZHVtcH0iClJFTU9WRV9JQ0VfU1RSPSJtb2R1bGVfYmxhY2tsaXN0PWljZSIKCiMgZXhpdCBpZiBmaWxlIGRvZXNuJ3QgZXhpc3QKWyAhIC1mICR7S0RVTVBfQ09ORn0gXSAmJiBleGl0IDAKCiMgZXhpdCBpZiBmaWxlIGFscmVhZHkgdXBkYXRlZAoke0dSRVB9IC1GcSAke1JFTU9WRV9JQ0VfU1RSfSAke0tEVU1QX0NPTkZ9ICYmIGV4aXQgMAoKIyBUYXJnZXQgbGluZSBsb29rcyBzb21ldGhpbmcgbGlrZSB0aGlzOgojIEtEVU1QX0NPTU1BTkRMSU5FX0FQUEVORD0iaXJxcG9sbCBucl9jcHVzPTEgLi4uIGhlc3RfZGlzYWJsZSIKIyBVc2Ugc2VkIHRvIG1hdGNoIGV2ZXJ5dGhpbmcgYmV0d2VlbiB0aGUgcXVvdGVzIGFuZCBhcHBlbmQgdGhlIFJFTU9WRV9JQ0VfU1RSIHRvIGl0CiR7U0VEfSAtaSAncy9eS0RVTVBfQ09NTUFORExJTkVfQVBQRU5EPSJbXiJdKi8mICcke1JFTU9WRV9JQ0VfU1RSfScvJyAke0tEVU1QX0NPTkZ9IHx8IGV4aXQgMAo= mode: 448 path: /usr/local/bin/kdump-remove-ice-module.sh
05-kdump-config-worker.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 05-kdump-config-worker spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: kdump-remove-ice-module.service contents: | [Unit] Description=Remove ice module when doing kdump Before=kdump.service [Service] Type=oneshot RemainAfterExit=true ExecStart=/usr/local/bin/kdump-remove-ice-module.sh [Install] WantedBy=multi-user.target storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKIyBUaGlzIHNjcmlwdCByZW1vdmVzIHRoZSBpY2UgbW9kdWxlIGZyb20ga2R1bXAgdG8gcHJldmVudCBrZHVtcCBmYWlsdXJlcyBvbiBjZXJ0YWluIHNlcnZlcnMuCiMgVGhpcyBpcyBhIHRlbXBvcmFyeSB3b3JrYXJvdW5kIGZvciBSSEVMUExBTi0xMzgyMzYgYW5kIGNhbiBiZSByZW1vdmVkIHdoZW4gdGhhdCBpc3N1ZSBpcwojIGZpeGVkLgoKc2V0IC14CgpTRUQ9Ii91c3IvYmluL3NlZCIKR1JFUD0iL3Vzci9iaW4vZ3JlcCIKCiMgb3ZlcnJpZGUgZm9yIHRlc3RpbmcgcHVycG9zZXMKS0RVTVBfQ09ORj0iJHsxOi0vZXRjL3N5c2NvbmZpZy9rZHVtcH0iClJFTU9WRV9JQ0VfU1RSPSJtb2R1bGVfYmxhY2tsaXN0PWljZSIKCiMgZXhpdCBpZiBmaWxlIGRvZXNuJ3QgZXhpc3QKWyAhIC1mICR7S0RVTVBfQ09ORn0gXSAmJiBleGl0IDAKCiMgZXhpdCBpZiBmaWxlIGFscmVhZHkgdXBkYXRlZAoke0dSRVB9IC1GcSAke1JFTU9WRV9JQ0VfU1RSfSAke0tEVU1QX0NPTkZ9ICYmIGV4aXQgMAoKIyBUYXJnZXQgbGluZSBsb29rcyBzb21ldGhpbmcgbGlrZSB0aGlzOgojIEtEVU1QX0NPTU1BTkRMSU5FX0FQUEVORD0iaXJxcG9sbCBucl9jcHVzPTEgLi4uIGhlc3RfZGlzYWJsZSIKIyBVc2Ugc2VkIHRvIG1hdGNoIGV2ZXJ5dGhpbmcgYmV0d2VlbiB0aGUgcXVvdGVzIGFuZCBhcHBlbmQgdGhlIFJFTU9WRV9JQ0VfU1RSIHRvIGl0CiR7U0VEfSAtaSAncy9eS0RVTVBfQ09NTUFORExJTkVfQVBQRU5EPSJbXiJdKi8mICcke1JFTU9WRV9JQ0VfU1RSfScvJyAke0tEVU1QX0NPTkZ9IHx8IGV4aXQgMAo= mode: 448 path: /usr/local/bin/kdump-remove-ice-module.sh
06-kdump-master.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 06-kdump-enable-master spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: kdump.service kernelArguments: - crashkernel=512M
06-kdump-worker.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 06-kdump-enable-worker spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: kdump.service kernelArguments: - crashkernel=512M
01-container-mount-ns-and-kubelet-conf-master.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: container-mount-namespace-and-kubelet-conf-master spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKCmRlYnVnKCkgewogIGVjaG8gJEAgPiYyCn0KCnVzYWdlKCkgewogIGVjaG8gVXNhZ2U6ICQoYmFzZW5hbWUgJDApIFVOSVQgW2VudmZpbGUgW3Zhcm5hbWVdXQogIGVjaG8KICBlY2hvIEV4dHJhY3QgdGhlIGNvbnRlbnRzIG9mIHRoZSBmaXJzdCBFeGVjU3RhcnQgc3RhbnphIGZyb20gdGhlIGdpdmVuIHN5c3RlbWQgdW5pdCBhbmQgcmV0dXJuIGl0IHRvIHN0ZG91dAogIGVjaG8KICBlY2hvICJJZiAnZW52ZmlsZScgaXMgcHJvdmlkZWQsIHB1dCBpdCBpbiB0aGVyZSBpbnN0ZWFkLCBhcyBhbiBlbnZpcm9ubWVudCB2YXJpYWJsZSBuYW1lZCAndmFybmFtZSciCiAgZWNobyAiRGVmYXVsdCAndmFybmFtZScgaXMgRVhFQ1NUQVJUIGlmIG5vdCBzcGVjaWZpZWQiCiAgZXhpdCAxCn0KClVOSVQ9JDEKRU5WRklMRT0kMgpWQVJOQU1FPSQzCmlmIFtbIC16ICRVTklUIHx8ICRVTklUID09ICItLWhlbHAiIHx8ICRVTklUID09ICItaCIgXV07IHRoZW4KICB1c2FnZQpmaQpkZWJ1ZyAiRXh0cmFjdGluZyBFeGVjU3RhcnQgZnJvbSAkVU5JVCIKRklMRT0kKHN5c3RlbWN0bCBjYXQgJFVOSVQgfCBoZWFkIC1uIDEpCkZJTEU9JHtGSUxFI1wjIH0KaWYgW1sgISAtZiAkRklMRSBdXTsgdGhlbgogIGRlYnVnICJGYWlsZWQgdG8gZmluZCByb290IGZpbGUgZm9yIHVuaXQgJFVOSVQgKCRGSUxFKSIKICBleGl0CmZpCmRlYnVnICJTZXJ2aWNlIGRlZmluaXRpb24gaXMgaW4gJEZJTEUiCkVYRUNTVEFSVD0kKHNlZCAtbiAtZSAnL15FeGVjU3RhcnQ9LipcXCQvLC9bXlxcXSQvIHsgcy9eRXhlY1N0YXJ0PS8vOyBwIH0nIC1lICcvXkV4ZWNTdGFydD0uKlteXFxdJC8geyBzL15FeGVjU3RhcnQ9Ly87IHAgfScgJEZJTEUpCgppZiBbWyAkRU5WRklMRSBdXTsgdGhlbgogIFZBUk5BTUU9JHtWQVJOQU1FOi1FWEVDU1RBUlR9CiAgZWNobyAiJHtWQVJOQU1FfT0ke0VYRUNTVEFSVH0iID4gJEVOVkZJTEUKZWxzZQogIGVjaG8gJEVYRUNTVEFSVApmaQo= mode: 493 path: /usr/local/bin/extractExecStart - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKbnNlbnRlciAtLW1vdW50PS9ydW4vY29udGFpbmVyLW1vdW50LW5hbWVzcGFjZS9tbnQgIiRAIgo= mode: 493 path: /usr/local/bin/nsenterCmns systemd: units: - contents: | [Unit] Description=Manages a mount namespace that both kubelet and crio can use to share their container-specific mounts [Service] Type=oneshot RemainAfterExit=yes RuntimeDirectory=container-mount-namespace Environment=RUNTIME_DIRECTORY=%t/container-mount-namespace Environment=BIND_POINT=%t/container-mount-namespace/mnt ExecStartPre=bash -c "findmnt ${RUNTIME_DIRECTORY} || mount --make-unbindable --bind ${RUNTIME_DIRECTORY} ${RUNTIME_DIRECTORY}" ExecStartPre=touch ${BIND_POINT} ExecStart=unshare --mount=${BIND_POINT} --propagation slave mount --make-rshared / ExecStop=umount -R ${RUNTIME_DIRECTORY} name: container-mount-namespace.service - dropins: - contents: | [Unit] Wants=container-mount-namespace.service After=container-mount-namespace.service [Service] ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART EnvironmentFile=-/%t/%N-execstart.env ExecStart= ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \ ${ORIG_EXECSTART}" name: 90-container-mount-namespace.conf name: crio.service - dropins: - contents: | [Unit] Wants=container-mount-namespace.service After=container-mount-namespace.service [Service] ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART EnvironmentFile=-/%t/%N-execstart.env ExecStart= ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \ ${ORIG_EXECSTART} --housekeeping-interval=30s" name: 90-container-mount-namespace.conf - contents: | [Service] Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s" Environment="OPENSHIFT_EVICTION_MONITORING_PERIOD_DURATION=30s" name: 30-kubelet-interval-tuning.conf name: kubelet.service
01-container-mount-ns-and-kubelet-conf-worker.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: container-mount-namespace-and-kubelet-conf-worker spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKCmRlYnVnKCkgewogIGVjaG8gJEAgPiYyCn0KCnVzYWdlKCkgewogIGVjaG8gVXNhZ2U6ICQoYmFzZW5hbWUgJDApIFVOSVQgW2VudmZpbGUgW3Zhcm5hbWVdXQogIGVjaG8KICBlY2hvIEV4dHJhY3QgdGhlIGNvbnRlbnRzIG9mIHRoZSBmaXJzdCBFeGVjU3RhcnQgc3RhbnphIGZyb20gdGhlIGdpdmVuIHN5c3RlbWQgdW5pdCBhbmQgcmV0dXJuIGl0IHRvIHN0ZG91dAogIGVjaG8KICBlY2hvICJJZiAnZW52ZmlsZScgaXMgcHJvdmlkZWQsIHB1dCBpdCBpbiB0aGVyZSBpbnN0ZWFkLCBhcyBhbiBlbnZpcm9ubWVudCB2YXJpYWJsZSBuYW1lZCAndmFybmFtZSciCiAgZWNobyAiRGVmYXVsdCAndmFybmFtZScgaXMgRVhFQ1NUQVJUIGlmIG5vdCBzcGVjaWZpZWQiCiAgZXhpdCAxCn0KClVOSVQ9JDEKRU5WRklMRT0kMgpWQVJOQU1FPSQzCmlmIFtbIC16ICRVTklUIHx8ICRVTklUID09ICItLWhlbHAiIHx8ICRVTklUID09ICItaCIgXV07IHRoZW4KICB1c2FnZQpmaQpkZWJ1ZyAiRXh0cmFjdGluZyBFeGVjU3RhcnQgZnJvbSAkVU5JVCIKRklMRT0kKHN5c3RlbWN0bCBjYXQgJFVOSVQgfCBoZWFkIC1uIDEpCkZJTEU9JHtGSUxFI1wjIH0KaWYgW1sgISAtZiAkRklMRSBdXTsgdGhlbgogIGRlYnVnICJGYWlsZWQgdG8gZmluZCByb290IGZpbGUgZm9yIHVuaXQgJFVOSVQgKCRGSUxFKSIKICBleGl0CmZpCmRlYnVnICJTZXJ2aWNlIGRlZmluaXRpb24gaXMgaW4gJEZJTEUiCkVYRUNTVEFSVD0kKHNlZCAtbiAtZSAnL15FeGVjU3RhcnQ9LipcXCQvLC9bXlxcXSQvIHsgcy9eRXhlY1N0YXJ0PS8vOyBwIH0nIC1lICcvXkV4ZWNTdGFydD0uKlteXFxdJC8geyBzL15FeGVjU3RhcnQ9Ly87IHAgfScgJEZJTEUpCgppZiBbWyAkRU5WRklMRSBdXTsgdGhlbgogIFZBUk5BTUU9JHtWQVJOQU1FOi1FWEVDU1RBUlR9CiAgZWNobyAiJHtWQVJOQU1FfT0ke0VYRUNTVEFSVH0iID4gJEVOVkZJTEUKZWxzZQogIGVjaG8gJEVYRUNTVEFSVApmaQo= mode: 493 path: /usr/local/bin/extractExecStart - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKbnNlbnRlciAtLW1vdW50PS9ydW4vY29udGFpbmVyLW1vdW50LW5hbWVzcGFjZS9tbnQgIiRAIgo= mode: 493 path: /usr/local/bin/nsenterCmns systemd: units: - contents: | [Unit] Description=Manages a mount namespace that both kubelet and crio can use to share their container-specific mounts [Service] Type=oneshot RemainAfterExit=yes RuntimeDirectory=container-mount-namespace Environment=RUNTIME_DIRECTORY=%t/container-mount-namespace Environment=BIND_POINT=%t/container-mount-namespace/mnt ExecStartPre=bash -c "findmnt ${RUNTIME_DIRECTORY} || mount --make-unbindable --bind ${RUNTIME_DIRECTORY} ${RUNTIME_DIRECTORY}" ExecStartPre=touch ${BIND_POINT} ExecStart=unshare --mount=${BIND_POINT} --propagation slave mount --make-rshared / ExecStop=umount -R ${RUNTIME_DIRECTORY} name: container-mount-namespace.service - dropins: - contents: | [Unit] Wants=container-mount-namespace.service After=container-mount-namespace.service [Service] ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART EnvironmentFile=-/%t/%N-execstart.env ExecStart= ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \ ${ORIG_EXECSTART}" name: 90-container-mount-namespace.conf name: crio.service - dropins: - contents: | [Unit] Wants=container-mount-namespace.service After=container-mount-namespace.service [Service] ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART EnvironmentFile=-/%t/%N-execstart.env ExecStart= ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \ ${ORIG_EXECSTART} --housekeeping-interval=30s" name: 90-container-mount-namespace.conf - contents: | [Service] Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s" Environment="OPENSHIFT_EVICTION_MONITORING_PERIOD_DURATION=30s" name: 30-kubelet-interval-tuning.conf name: kubelet.service
99-sync-time-once-master.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 99-sync-time-once-master spec: config: ignition: version: 3.2.0 systemd: units: - contents: | [Unit] Description=Sync time once After=network.service [Service] Type=oneshot TimeoutStartSec=300 ExecCondition=/bin/bash -c 'systemctl is-enabled chronyd.service --quiet && exit 1 || exit 0' ExecStart=/usr/sbin/chronyd -n -f /etc/chrony.conf -q RemainAfterExit=yes [Install] WantedBy=multi-user.target enabled: true name: sync-time-once.service
99-sync-time-once-worker.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 99-sync-time-once-worker spec: config: ignition: version: 3.2.0 systemd: units: - contents: | [Unit] Description=Sync time once After=network.service [Service] Type=oneshot TimeoutStartSec=300 ExecCondition=/bin/bash -c 'systemctl is-enabled chronyd.service --quiet && exit 1 || exit 0' ExecStart=/usr/sbin/chronyd -n -f /etc/chrony.conf -q RemainAfterExit=yes [Install] WantedBy=multi-user.target enabled: true name: sync-time-once.service
03-sctp-machine-config-master.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: load-sctp-module-master spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:, verification: {} filesystem: root mode: 420 path: /etc/modprobe.d/sctp-blacklist.conf - contents: source: data:text/plain;charset=utf-8,sctp filesystem: root mode: 420 path: /etc/modules-load.d/sctp-load.conf
03-sctp-machine-config-worker.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: load-sctp-module-worker spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:, verification: {} filesystem: root mode: 420 path: /etc/modprobe.d/sctp-blacklist.conf - contents: source: data:text/plain;charset=utf-8,sctp filesystem: root mode: 420 path: /etc/modules-load.d/sctp-load.conf
08-set-rcu-normal-master.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 08-set-rcu-normal-master spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKIwojIERpc2FibGUgcmN1X2V4cGVkaXRlZCBhZnRlciBub2RlIGhhcyBmaW5pc2hlZCBib290aW5nCiMKIyBUaGUgZGVmYXVsdHMgYmVsb3cgY2FuIGJlIG92ZXJyaWRkZW4gdmlhIGVudmlyb25tZW50IHZhcmlhYmxlcwojCgojIERlZmF1bHQgd2FpdCB0aW1lIGlzIDYwMHMgPSAxMG06Ck1BWElNVU1fV0FJVF9USU1FPSR7TUFYSU1VTV9XQUlUX1RJTUU6LTYwMH0KCiMgRGVmYXVsdCBzdGVhZHktc3RhdGUgdGhyZXNob2xkID0gMiUKIyBBbGxvd2VkIHZhbHVlczoKIyAgNCAgLSBhYnNvbHV0ZSBwb2QgY291bnQgKCsvLSkKIyAgNCUgLSBwZXJjZW50IGNoYW5nZSAoKy8tKQojICAtMSAtIGRpc2FibGUgdGhlIHN0ZWFkeS1zdGF0ZSBjaGVjawpTVEVBRFlfU1RBVEVfVEhSRVNIT0xEPSR7U1RFQURZX1NUQVRFX1RIUkVTSE9MRDotMiV9CgojIERlZmF1bHQgc3RlYWR5LXN0YXRlIHdpbmRvdyA9IDYwcwojIElmIHRoZSBydW5uaW5nIHBvZCBjb3VudCBzdGF5cyB3aXRoaW4gdGhlIGdpdmVuIHRocmVzaG9sZCBmb3IgdGhpcyB0aW1lCiMgcGVyaW9kLCByZXR1cm4gQ1BVIHV0aWxpemF0aW9uIHRvIG5vcm1hbCBiZWZvcmUgdGhlIG1heGltdW0gd2FpdCB0aW1lIGhhcwojIGV4cGlyZXMKU1RFQURZX1NUQVRFX1dJTkRPVz0ke1NURUFEWV9TVEFURV9XSU5ET1c6LTYwfQoKIyBEZWZhdWx0IHN0ZWFkeS1zdGF0ZSBhbGxvd3MgYW55IHBvZCBjb3VudCB0byBiZSAic3RlYWR5IHN0YXRlIgojIEluY3JlYXNpbmcgdGhpcyB3aWxsIHNraXAgYW55IHN0ZWFkeS1zdGF0ZSBjaGVja3MgdW50aWwgdGhlIGNvdW50IHJpc2VzIGFib3ZlCiMgdGhpcyBudW1iZXIgdG8gYXZvaWQgZmFsc2UgcG9zaXRpdmVzIGlmIHRoZXJlIGFyZSBzb21lIHBlcmlvZHMgd2hlcmUgdGhlCiMgY291bnQgZG9lc24ndCBpbmNyZWFzZSBidXQgd2Uga25vdyB3ZSBjYW4ndCBiZSBhdCBzdGVhZHktc3RhdGUgeWV0LgpTVEVBRFlfU1RBVEVfTUlOSU1VTT0ke1NURUFEWV9TVEFURV9NSU5JTVVNOi0wfQoKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoKd2l0aGluKCkgewogIGxvY2FsIGxhc3Q9JDEgY3VycmVudD0kMiB0aHJlc2hvbGQ9JDMKICBsb2NhbCBkZWx0YT0wIHBjaGFuZ2UKICBkZWx0YT0kKCggY3VycmVudCAtIGxhc3QgKSkKICBpZiBbWyAkY3VycmVudCAtZXEgJGxhc3QgXV07IHRoZW4KICAgIHBjaGFuZ2U9MAogIGVsaWYgW1sgJGxhc3QgLWVxIDAgXV07IHRoZW4KICAgIHBjaGFuZ2U9MTAwMDAwMAogIGVsc2UKICAgIHBjaGFuZ2U9JCgoICggIiRkZWx0YSIgKiAxMDApIC8gbGFzdCApKQogIGZpCiAgZWNobyAtbiAibGFzdDokbGFzdCBjdXJyZW50OiRjdXJyZW50IGRlbHRhOiRkZWx0YSBwY2hhbmdlOiR7cGNoYW5nZX0lOiAiCiAgbG9jYWwgYWJzb2x1dGUgbGltaXQKICBjYXNlICR0aHJlc2hvbGQgaW4KICAgIColKQogICAgICBhYnNvbHV0ZT0ke3BjaGFuZ2UjIy19ICMgYWJzb2x1dGUgdmFsdWUKICAgICAgbGltaXQ9JHt0aHJlc2hvbGQlJSV9CiAgICAgIDs7CiAgICAqKQogICAgICBhYnNvbHV0ZT0ke2RlbHRhIyMtfSAjIGFic29sdXRlIHZhbHVlCiAgICAgIGxpbWl0PSR0aHJlc2hvbGQKICAgICAgOzsKICBlc2FjCiAgaWYgW1sgJGFic29sdXRlIC1sZSAkbGltaXQgXV07IHRoZW4KICAgIGVjaG8gIndpdGhpbiAoKy8tKSR0aHJlc2hvbGQiCiAgICByZXR1cm4gMAogIGVsc2UKICAgIGVjaG8gIm91dHNpZGUgKCsvLSkkdGhyZXNob2xkIgogICAgcmV0dXJuIDEKICBmaQp9CgpzdGVhZHlzdGF0ZSgpIHsKICBsb2NhbCBsYXN0PSQxIGN1cnJlbnQ9JDIKICBpZiBbWyAkbGFzdCAtbHQgJFNURUFEWV9TVEFURV9NSU5JTVVNIF1dOyB0aGVuCiAgICBlY2hvICJsYXN0OiRsYXN0IGN1cnJlbnQ6JGN1cnJlbnQgV2FpdGluZyB0byByZWFjaCAkU1RFQURZX1NUQVRFX01JTklNVU0gYmVmb3JlIGNoZWNraW5nIGZvciBzdGVhZHktc3RhdGUiCiAgICByZXR1cm4gMQogIGZpCiAgd2l0aGluICIkbGFzdCIgIiRjdXJyZW50IiAiJFNURUFEWV9TVEFURV9USFJFU0hPTEQiCn0KCndhaXRGb3JSZWFkeSgpIHsKICBsb2dnZXIgIlJlY292ZXJ5OiBXYWl0aW5nICR7TUFYSU1VTV9XQUlUX1RJTUV9cyBmb3IgdGhlIGluaXRpYWxpemF0aW9uIHRvIGNvbXBsZXRlIgogIGxvY2FsIHQ9MCBzPTEwCiAgbG9jYWwgbGFzdENjb3VudD0wIGNjb3VudD0wIHN0ZWFkeVN0YXRlVGltZT0wCiAgd2hpbGUgW1sgJHQgLWx0ICRNQVhJTVVNX1dBSVRfVElNRSBdXTsgZG8KICAgIHNsZWVwICRzCiAgICAoKHQgKz0gcykpCiAgICAjIERldGVjdCBzdGVhZHktc3RhdGUgcG9kIGNvdW50CiAgICBjY291bnQ9JChjcmljdGwgcHMgMj4vZGV2L251bGwgfCB3YyAtbCkKICAgIGlmIFtbICRjY291bnQgLWd0IDAgXV0gJiYgc3RlYWR5c3RhdGUgIiRsYXN0Q2NvdW50IiAiJGNjb3VudCI7IHRoZW4KICAgICAgKChzdGVhZHlTdGF0ZVRpbWUgKz0gcykpCiAgICAgIGVjaG8gIlN0ZWFkeS1zdGF0ZSBmb3IgJHtzdGVhZHlTdGF0ZVRpbWV9cy8ke1NURUFEWV9TVEFURV9XSU5ET1d9cyIKICAgICAgaWYgW1sgJHN0ZWFkeVN0YXRlVGltZSAtZ2UgJFNURUFEWV9TVEFURV9XSU5ET1cgXV07IHRoZW4KICAgICAgICBsb2dnZXIgIlJlY292ZXJ5OiBTdGVhZHktc3RhdGUgKCsvLSAkU1RFQURZX1NUQVRFX1RIUkVTSE9MRCkgZm9yICR7U1RFQURZX1NUQVRFX1dJTkRPV31zOiBEb25lIgogICAgICAgIHJldHVybiAwCiAgICAgIGZpCiAgICBlbHNlCiAgICAgIGlmIFtbICRzdGVhZHlTdGF0ZVRpbWUgLWd0IDAgXV07IHRoZW4KICAgICAgICBlY2hvICJSZXNldHRpbmcgc3RlYWR5LXN0YXRlIHRpbWVyIgogICAgICAgIHN0ZWFkeVN0YXRlVGltZT0wCiAgICAgIGZpCiAgICBmaQogICAgbGFzdENjb3VudD0kY2NvdW50CiAgZG9uZQogIGxvZ2dlciAiUmVjb3Zlcnk6IFJlY292ZXJ5IENvbXBsZXRlIFRpbWVvdXQiCn0KCnNldFJjdU5vcm1hbCgpIHsKICBlY2hvICJTZXR0aW5nIHJjdV9ub3JtYWwgdG8gMSIKICBlY2hvIDEgPiAvc3lzL2tlcm5lbC9yY3Vfbm9ybWFsCn0KCm1haW4oKSB7CiAgd2FpdEZvclJlYWR5CiAgZWNobyAiV2FpdGluZyBmb3Igc3RlYWR5IHN0YXRlIHRvb2s6ICQoYXdrICd7cHJpbnQgaW50KCQxLzM2MDApImgiLCBpbnQoKCQxJTM2MDApLzYwKSJtIiwgaW50KCQxJTYwKSJzIn0nIC9wcm9jL3VwdGltZSkiCiAgc2V0UmN1Tm9ybWFsCn0KCmlmIFtbICIke0JBU0hfU09VUkNFWzBdfSIgPSAiJHswfSIgXV07IHRoZW4KICBtYWluICIke0B9IgogIGV4aXQgJD8KZmkK mode: 493 path: /usr/local/bin/set-rcu-normal.sh systemd: units: - contents: | [Unit] Description=Disable rcu_expedited after node has finished booting by setting rcu_normal to 1 [Service] Type=simple ExecStart=/usr/local/bin/set-rcu-normal.sh # Maximum wait time is 600s = 10m: Environment=MAXIMUM_WAIT_TIME=600 # Steady-state threshold = 2% # Allowed values: # 4 - absolute pod count (+/-) # 4% - percent change (+/-) # -1 - disable the steady-state check # Note: '%' must be escaped as '%%' in systemd unit files Environment=STEADY_STATE_THRESHOLD=2%% # Steady-state window = 120s # If the running pod count stays within the given threshold for this time # period, return CPU utilization to normal before the maximum wait time has # expires Environment=STEADY_STATE_WINDOW=120 # Steady-state minimum = 40 # Increasing this will skip any steady-state checks until the count rises above # this number to avoid false positives if there are some periods where the # count doesn't increase but we know we can't be at steady-state yet. Environment=STEADY_STATE_MINIMUM=40 [Install] WantedBy=multi-user.target enabled: true name: set-rcu-normal.service
08-set-rcu-normal-worker.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 08-set-rcu-normal-worker spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKIwojIERpc2FibGUgcmN1X2V4cGVkaXRlZCBhZnRlciBub2RlIGhhcyBmaW5pc2hlZCBib290aW5nCiMKIyBUaGUgZGVmYXVsdHMgYmVsb3cgY2FuIGJlIG92ZXJyaWRkZW4gdmlhIGVudmlyb25tZW50IHZhcmlhYmxlcwojCgojIERlZmF1bHQgd2FpdCB0aW1lIGlzIDYwMHMgPSAxMG06Ck1BWElNVU1fV0FJVF9USU1FPSR7TUFYSU1VTV9XQUlUX1RJTUU6LTYwMH0KCiMgRGVmYXVsdCBzdGVhZHktc3RhdGUgdGhyZXNob2xkID0gMiUKIyBBbGxvd2VkIHZhbHVlczoKIyAgNCAgLSBhYnNvbHV0ZSBwb2QgY291bnQgKCsvLSkKIyAgNCUgLSBwZXJjZW50IGNoYW5nZSAoKy8tKQojICAtMSAtIGRpc2FibGUgdGhlIHN0ZWFkeS1zdGF0ZSBjaGVjawpTVEVBRFlfU1RBVEVfVEhSRVNIT0xEPSR7U1RFQURZX1NUQVRFX1RIUkVTSE9MRDotMiV9CgojIERlZmF1bHQgc3RlYWR5LXN0YXRlIHdpbmRvdyA9IDYwcwojIElmIHRoZSBydW5uaW5nIHBvZCBjb3VudCBzdGF5cyB3aXRoaW4gdGhlIGdpdmVuIHRocmVzaG9sZCBmb3IgdGhpcyB0aW1lCiMgcGVyaW9kLCByZXR1cm4gQ1BVIHV0aWxpemF0aW9uIHRvIG5vcm1hbCBiZWZvcmUgdGhlIG1heGltdW0gd2FpdCB0aW1lIGhhcwojIGV4cGlyZXMKU1RFQURZX1NUQVRFX1dJTkRPVz0ke1NURUFEWV9TVEFURV9XSU5ET1c6LTYwfQoKIyBEZWZhdWx0IHN0ZWFkeS1zdGF0ZSBhbGxvd3MgYW55IHBvZCBjb3VudCB0byBiZSAic3RlYWR5IHN0YXRlIgojIEluY3JlYXNpbmcgdGhpcyB3aWxsIHNraXAgYW55IHN0ZWFkeS1zdGF0ZSBjaGVja3MgdW50aWwgdGhlIGNvdW50IHJpc2VzIGFib3ZlCiMgdGhpcyBudW1iZXIgdG8gYXZvaWQgZmFsc2UgcG9zaXRpdmVzIGlmIHRoZXJlIGFyZSBzb21lIHBlcmlvZHMgd2hlcmUgdGhlCiMgY291bnQgZG9lc24ndCBpbmNyZWFzZSBidXQgd2Uga25vdyB3ZSBjYW4ndCBiZSBhdCBzdGVhZHktc3RhdGUgeWV0LgpTVEVBRFlfU1RBVEVfTUlOSU1VTT0ke1NURUFEWV9TVEFURV9NSU5JTVVNOi0wfQoKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoKd2l0aGluKCkgewogIGxvY2FsIGxhc3Q9JDEgY3VycmVudD0kMiB0aHJlc2hvbGQ9JDMKICBsb2NhbCBkZWx0YT0wIHBjaGFuZ2UKICBkZWx0YT0kKCggY3VycmVudCAtIGxhc3QgKSkKICBpZiBbWyAkY3VycmVudCAtZXEgJGxhc3QgXV07IHRoZW4KICAgIHBjaGFuZ2U9MAogIGVsaWYgW1sgJGxhc3QgLWVxIDAgXV07IHRoZW4KICAgIHBjaGFuZ2U9MTAwMDAwMAogIGVsc2UKICAgIHBjaGFuZ2U9JCgoICggIiRkZWx0YSIgKiAxMDApIC8gbGFzdCApKQogIGZpCiAgZWNobyAtbiAibGFzdDokbGFzdCBjdXJyZW50OiRjdXJyZW50IGRlbHRhOiRkZWx0YSBwY2hhbmdlOiR7cGNoYW5nZX0lOiAiCiAgbG9jYWwgYWJzb2x1dGUgbGltaXQKICBjYXNlICR0aHJlc2hvbGQgaW4KICAgIColKQogICAgICBhYnNvbHV0ZT0ke3BjaGFuZ2UjIy19ICMgYWJzb2x1dGUgdmFsdWUKICAgICAgbGltaXQ9JHt0aHJlc2hvbGQlJSV9CiAgICAgIDs7CiAgICAqKQogICAgICBhYnNvbHV0ZT0ke2RlbHRhIyMtfSAjIGFic29sdXRlIHZhbHVlCiAgICAgIGxpbWl0PSR0aHJlc2hvbGQKICAgICAgOzsKICBlc2FjCiAgaWYgW1sgJGFic29sdXRlIC1sZSAkbGltaXQgXV07IHRoZW4KICAgIGVjaG8gIndpdGhpbiAoKy8tKSR0aHJlc2hvbGQiCiAgICByZXR1cm4gMAogIGVsc2UKICAgIGVjaG8gIm91dHNpZGUgKCsvLSkkdGhyZXNob2xkIgogICAgcmV0dXJuIDEKICBmaQp9CgpzdGVhZHlzdGF0ZSgpIHsKICBsb2NhbCBsYXN0PSQxIGN1cnJlbnQ9JDIKICBpZiBbWyAkbGFzdCAtbHQgJFNURUFEWV9TVEFURV9NSU5JTVVNIF1dOyB0aGVuCiAgICBlY2hvICJsYXN0OiRsYXN0IGN1cnJlbnQ6JGN1cnJlbnQgV2FpdGluZyB0byByZWFjaCAkU1RFQURZX1NUQVRFX01JTklNVU0gYmVmb3JlIGNoZWNraW5nIGZvciBzdGVhZHktc3RhdGUiCiAgICByZXR1cm4gMQogIGZpCiAgd2l0aGluICIkbGFzdCIgIiRjdXJyZW50IiAiJFNURUFEWV9TVEFURV9USFJFU0hPTEQiCn0KCndhaXRGb3JSZWFkeSgpIHsKICBsb2dnZXIgIlJlY292ZXJ5OiBXYWl0aW5nICR7TUFYSU1VTV9XQUlUX1RJTUV9cyBmb3IgdGhlIGluaXRpYWxpemF0aW9uIHRvIGNvbXBsZXRlIgogIGxvY2FsIHQ9MCBzPTEwCiAgbG9jYWwgbGFzdENjb3VudD0wIGNjb3VudD0wIHN0ZWFkeVN0YXRlVGltZT0wCiAgd2hpbGUgW1sgJHQgLWx0ICRNQVhJTVVNX1dBSVRfVElNRSBdXTsgZG8KICAgIHNsZWVwICRzCiAgICAoKHQgKz0gcykpCiAgICAjIERldGVjdCBzdGVhZHktc3RhdGUgcG9kIGNvdW50CiAgICBjY291bnQ9JChjcmljdGwgcHMgMj4vZGV2L251bGwgfCB3YyAtbCkKICAgIGlmIFtbICRjY291bnQgLWd0IDAgXV0gJiYgc3RlYWR5c3RhdGUgIiRsYXN0Q2NvdW50IiAiJGNjb3VudCI7IHRoZW4KICAgICAgKChzdGVhZHlTdGF0ZVRpbWUgKz0gcykpCiAgICAgIGVjaG8gIlN0ZWFkeS1zdGF0ZSBmb3IgJHtzdGVhZHlTdGF0ZVRpbWV9cy8ke1NURUFEWV9TVEFURV9XSU5ET1d9cyIKICAgICAgaWYgW1sgJHN0ZWFkeVN0YXRlVGltZSAtZ2UgJFNURUFEWV9TVEFURV9XSU5ET1cgXV07IHRoZW4KICAgICAgICBsb2dnZXIgIlJlY292ZXJ5OiBTdGVhZHktc3RhdGUgKCsvLSAkU1RFQURZX1NUQVRFX1RIUkVTSE9MRCkgZm9yICR7U1RFQURZX1NUQVRFX1dJTkRPV31zOiBEb25lIgogICAgICAgIHJldHVybiAwCiAgICAgIGZpCiAgICBlbHNlCiAgICAgIGlmIFtbICRzdGVhZHlTdGF0ZVRpbWUgLWd0IDAgXV07IHRoZW4KICAgICAgICBlY2hvICJSZXNldHRpbmcgc3RlYWR5LXN0YXRlIHRpbWVyIgogICAgICAgIHN0ZWFkeVN0YXRlVGltZT0wCiAgICAgIGZpCiAgICBmaQogICAgbGFzdENjb3VudD0kY2NvdW50CiAgZG9uZQogIGxvZ2dlciAiUmVjb3Zlcnk6IFJlY292ZXJ5IENvbXBsZXRlIFRpbWVvdXQiCn0KCnNldFJjdU5vcm1hbCgpIHsKICBlY2hvICJTZXR0aW5nIHJjdV9ub3JtYWwgdG8gMSIKICBlY2hvIDEgPiAvc3lzL2tlcm5lbC9yY3Vfbm9ybWFsCn0KCm1haW4oKSB7CiAgd2FpdEZvclJlYWR5CiAgZWNobyAiV2FpdGluZyBmb3Igc3RlYWR5IHN0YXRlIHRvb2s6ICQoYXdrICd7cHJpbnQgaW50KCQxLzM2MDApImgiLCBpbnQoKCQxJTM2MDApLzYwKSJtIiwgaW50KCQxJTYwKSJzIn0nIC9wcm9jL3VwdGltZSkiCiAgc2V0UmN1Tm9ybWFsCn0KCmlmIFtbICIke0JBU0hfU09VUkNFWzBdfSIgPSAiJHswfSIgXV07IHRoZW4KICBtYWluICIke0B9IgogIGV4aXQgJD8KZmkK mode: 493 path: /usr/local/bin/set-rcu-normal.sh systemd: units: - contents: | [Unit] Description=Disable rcu_expedited after node has finished booting by setting rcu_normal to 1 [Service] Type=simple ExecStart=/usr/local/bin/set-rcu-normal.sh # Maximum wait time is 600s = 10m: Environment=MAXIMUM_WAIT_TIME=600 # Steady-state threshold = 2% # Allowed values: # 4 - absolute pod count (+/-) # 4% - percent change (+/-) # -1 - disable the steady-state check # Note: '%' must be escaped as '%%' in systemd unit files Environment=STEADY_STATE_THRESHOLD=2%% # Steady-state window = 120s # If the running pod count stays within the given threshold for this time # period, return CPU utilization to normal before the maximum wait time has # expires Environment=STEADY_STATE_WINDOW=120 # Steady-state minimum = 40 # Increasing this will skip any steady-state checks until the count rises above # this number to avoid false positives if there are some periods where the # count doesn't increase but we know we can't be at steady-state yet. Environment=STEADY_STATE_MINIMUM=40 [Install] WantedBy=multi-user.target enabled: true name: set-rcu-normal.service
3.2.5. Telco RAN DU reference configuration software specifications
The following information describes the telco RAN DU reference design specification (RDS) validated software versions.
3.2.5.1. Telco RAN DU 4.15 validated software components
The Red Hat telco RAN DU 4.15 solution has been validated using the following Red Hat software products for OpenShift Container Platform managed clusters and hub clusters.
Component | Software version |
---|---|
Managed cluster version | 4.15 |
Cluster Logging Operator | 5.8 |
Local Storage Operator | 4.15 |
PTP Operator | 4.15 |
SRIOV Operator | 4.15 |
Node Tuning Operator | 4.15 |
Logging Operator | 4.15 |
SRIOV-FEC Operator | 2.8 |
Component | Software version |
---|---|
Hub cluster version | 4.15 |
GitOps ZTP plugin | 4.15 |
Red Hat Advanced Cluster Management (RHACM) | 2.9, 2.10 |
Red Hat OpenShift GitOps | 1.11 |
Topology Aware Lifecycle Manager (TALM) | 4.15 |
3.3. Telco core reference design specification
3.3.1. Telco core 4.15 reference design overview
The telco core reference design specification (RDS) configures a OpenShift Container Platform cluster running on commodity hardware to host telco core workloads.
3.3.1.1. OpenShift Container Platform 4.15 features for telco core
The following features that are included in OpenShift Container Platform 4.15 and are leveraged by the telco core reference design specification (RDS) have been added or updated.
Feature | Description |
---|---|
Multi-network policy support for IPv6 Networks | You can now create multi-network policies for IPv6 networks. For more information, see Supporting multi-network policies in IPv6 networks. |
3.3.2. Telco core 4.15 use model overview
The Telco core reference design specification (RDS) describes a platform that supports large-scale telco applications including control plane functions such as signaling and aggregation. It also includes some centralized data plane functions, for example, user plane functions (UPF). These functions generally require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge deployments like RAN.
Telco core use model architecture
The networking prerequisites for telco core functions are diverse and encompass an array of networking attributes and performance benchmarks. IPv6 is mandatory, with dual-stack configurations being prevalent. Certain functions demand maximum throughput and transaction rates, necessitating user plane networking support such as DPDK. Other functions adhere to conventional cloud-native patterns and can use solutions such as OVN-K, kernel networking, and load balancing.
Telco core clusters are configured as standard three control plane clusters with worker nodes configured with the stock non real-time (RT) kernel. To support workloads with varying networking and performance requirements, worker nodes are segmented using MachineConfigPool
CRs. For example, this is done to separate non-user data plane nodes from high-throughput nodes. To support the required telco operational features, the clusters have a standard set of Operator Lifecycle Manager (OLM) Day 2 Operators installed.
3.3.2.1. Common baseline model
The following configurations and use model description are applicable to all telco core use cases.
- Cluster
The cluster conforms to these requirements:
- High-availability (3+ supervisor nodes) control plane
- Non-schedulable supervisor nodes
- Storage
- Core use cases require persistent storage as provided by external OpenShift Data Foundation. For more information, see the "Storage" subsection in "Reference core design components".
- Networking
Telco core clusters networking conforms to these requirements:
- Dual stack IPv4/IPv6
- Fully disconnected: Clusters do not have access to public networking at any point in their lifecycle.
- Multiple networks: Segmented networking provides isolation between OAM, signaling, and storage traffic.
- Cluster network type: OVN-Kubernetes is required for IPv6 support.
Core clusters have multiple layers of networking supported by underlying RHCOS, SR-IOV Operator, Load Balancer, and other components detailed in the following "Networking" section. At a high level these layers include:
Cluster networking: The cluster network configuration is defined and applied through the installation configuration. Updates to the configuration can be done at day-2 through the NMState Operator. Initial configuration can be used to establish:
- Host interface configuration
- A/A Bonding (Link Aggregation Control Protocol (LACP))
Secondary or additional networks: OpenShift CNI is configured through the Network
additionalNetworks
or NetworkAttachmentDefinition CRs.- MACVLAN
- Application Workload: User plane networking is running in cloud-native network functions (CNFs).
- Service Mesh
- Use of Service Mesh by telco CNFs is very common. It is expected that all core clusters will include a Service Mesh implementation. Service Mesh implementation and configuration is outside the scope of this specification.
3.3.2.1.1. Engineering Considerations common use model
The following engineering considerations are relevant for the common use model.
- Worker nodes
- Worker nodes run on Intel 3rd Generation Xeon (IceLake) processors or newer. Alternatively, if using Skylake or earlier processors, the mitigations for silicon security vulnerabilities such as Spectre must be disabled; failure to do so may result in a significant 40 percent decrease in transaction performance.
-
IRQ Balancing is enabled on worker nodes. The
PerformanceProfile
setsgloballyDisableIrqLoadBalancing: false
. Guaranteed QoS Pods are annotated to ensure isolation as described in "CPU partitioning and performance tuning" subsection in "Reference core design components" section.
- All nodes
- Hyper-Threading is enabled on all nodes
-
CPU architecture is
x86_64
only - Nodes are running the stock (non-RT) kernel
- Nodes are not configured for workload partitioning
The balance of node configuration between power management and maximum performance varies between MachineConfigPools
in the cluster. This configuration is consistent for all nodes within a MachineConfigPool
.
- CPU partitioning
-
CPU partitioning is configured using the PerformanceProfile and applied on a per
MachineConfigPool
basis. See the "CPU partitioning and performance tuning" subsection in "Reference core design components".
3.3.2.1.2. Application workloads
Application workloads running on core clusters might include a mix of high-performance networking CNFs and traditional best-effort or burstable pod workloads.
Guaranteed QoS scheduling is available to pods that require exclusive or dedicated use of CPUs due to performance or security requirements. Typically pods hosting high-performance and low-latency-sensitive Cloud Native Functions (CNFs) utilizing user plane networking with DPDK necessitate the exclusive utilization of entire CPUs. This is accomplished through node tuning and guaranteed Quality of Service (QoS) scheduling. For pods that require exclusive use of CPUs, be aware of the potential implications of hyperthreaded systems and configure them to request multiples of 2 CPUs when the entire core (2 hyperthreads) must be allocated to the pod.
Pods running network functions that do not require the high throughput and low latency networking are typically scheduled with best-effort or burstable QoS and do not require dedicated or isolated CPU cores.
- Description of limits
- CNF applications should conform to the latest version of the Red Hat Best Practices for Kubernetes guide.
For a mix of best-effort and burstable QoS pods.
-
Guaranteed QoS pods might be used but require correct configuration of reserved and isolated CPUs in the
PerformanceProfile
. - Guaranteed QoS Pods must include annotations for fully isolating CPUs.
- Best effort and burstable pods are not guaranteed exclusive use of a CPU. Workloads might be preempted by other workloads, operating system daemons, or kernel tasks.
-
Guaranteed QoS pods might be used but require correct configuration of reserved and isolated CPUs in the
Exec probes should be avoided unless there is no viable alternative.
- Do not use exec probes if a CNF is using CPU pinning.
-
Other probe implementations, for example
httpGet/tcpSocket
, should be used.
NoteStartup probes require minimal resources during steady-state operation. The limitation on exec probes applies primarily to liveness and readiness probes.
- Signaling workload
- Signaling workloads typically use SCTP, REST, gRPC, or similar TCP or UDP protocols.
- The transactions per second (TPS) is in the order of hundreds of thousands using secondary CNI (multus) configured as MACVLAN or SR-IOV.
- Signaling workloads run in pods with either guaranteed or burstable QoS.
3.3.3. Telco core reference design components
The following sections describe the various OpenShift Container Platform components and configurations that you use to configure and deploy clusters to run telco core workloads.
3.3.3.1. CPU partitioning and performance tuning
- New in this release
- No reference design updates in this release
- Description
-
CPU partitioning allows for the separation of sensitive workloads from generic purposes, auxiliary processes, interrupts, and driver work queues to achieve improved performance and latency. The CPUs allocated to those auxiliary processes are referred to as
reserved
in the following sections. In hyperthreaded systems, a CPU is one hyperthread. - Limits and requirements
The operating system needs a certain amount of CPU to perform all the support tasks including kernel networking.
- A system with just user plane networking applications (DPDK) needs at least one Core (2 hyperthreads when enabled) reserved for the operating system and the infrastructure components.
- A system with Hyper-Threading enabled must always put all core sibling threads to the same pool of CPUs.
- The set of reserved and isolated cores must include all CPU cores.
- Core 0 of each NUMA node must be included in the reserved CPU set.
Isolated cores might be impacted by interrupts. The following annotations must be attached to the pod if guaranteed QoS pods require full use of the CPU:
cpu-load-balancing.crio.io: "disable" cpu-quota.crio.io: "disable" irq-load-balancing.crio.io: "disable"
When per-pod power management is enabled with
PerformanceProfile.workloadHints.perPodPowerManagement
the following annotations must also be attached to the pod if guaranteed QoS pods require full use of the CPU:cpu-c-states.crio.io: "disable" cpu-freq-governor.crio.io: "performance"
- Engineering considerations
-
The minimum reserved capacity (
systemReserved
) required can be found by following the guidance in "Which amount of CPU and memory are recommended to reserve for the system in OpenShift 4 nodes?" - The actual required reserved CPU capacity depends on the cluster configuration and workload attributes.
- This reserved CPU value must be rounded up to a full core (2 hyper-thread) alignment.
- Changes to the CPU partitioning will drain and reboot the nodes in the MCP.
- The reserved CPUs reduce the pod density, as the reserved CPUs are removed from the allocatable capacity of the OpenShift node.
- The real-time workload hint should be enabled if the workload is real-time capable.
- Hardware without Interrupt Request (IRQ) affinity support will impact isolated CPUs. To ensure that pods with guaranteed CPU QoS have full use of allocated CPU, all hardware in the server must support IRQ affinity.
-
OVS dynamically manages its
cpuset
configuration to adapt to network traffic needs. You do not need to reserve additional CPUs for handling high network throughput on the primary CNI.
-
The minimum reserved capacity (
3.3.3.2. Service Mesh
- Description
- Telco core CNFs typically require a service mesh implementation. The specific features and performance required are dependent on the application. The selection of service mesh implementation and configuration is outside the scope of this documentation. The impact of service mesh on cluster resource utilization and performance, including additional latency introduced into pod networking, must be accounted for in the overall solution engineering.
Additional resources
3.3.3.3. Networking
OpenShift Container Platform networking is an ecosystem of features, plugins, and advanced networking capabilities that extend Kubernetes networking with the advanced networking-related features that your cluster needs to manage its network traffic for one or multiple hybrid clusters.
Additional resources
3.3.3.3.1. Cluster Network Operator (CNO)
- New in this release
- No reference design updates in this release
- Description
The CNO deploys and manages the cluster network components including the default OVN-Kubernetes network plugin during OpenShift Container Platform cluster installation. It allows configuring primary interface MTU settings, OVN gateway modes to use node routing tables for pod egress, and additional secondary networks such as MACVLAN.
In support of network traffic separation, multiple network interfaces are configured through the CNO. Traffic steering to these interfaces is configured through static routes applied by using the NMState Operator. To ensure that pod traffic is properly routed, OVN-K is configured with the
routingViaHost
option enabled. This setting uses the kernel routing table and the applied static routes rather than OVN for pod egress traffic.The Whereabouts CNI plugin is used to provide dynamic IPv4 and IPv6 addressing for additional pod network interfaces without the use of a DHCP server.
- Limits and requirements
- OVN-Kubernetes is required for IPv6 support.
- Large MTU cluster support requires connected network equipment to be set to the same or larger value.
- Engineering considerations
-
Pod egress traffic is handled by kernel routing table with the
routingViaHost
option. Appropriate static routes must be configured in the host.
-
Pod egress traffic is handled by kernel routing table with the
Additional resources
3.3.3.3.2. Load Balancer
- New in this release
- No reference design updates in this release
- Description
MetalLB is a load-balancer implementation for bare metal Kubernetes clusters using standard routing protocols. It enables a Kubernetes service to get an external IP address which is also added to the host network for the cluster.
Some use cases might require features not available in MetalLB, for example stateful load balancing. Where necessary, you can use an external third party load balancer. Selection and configuration of an external load balancer is outside the scope of this specification. When an external third party load balancer is used, the integration effort must include enough analysis to ensure all performance and resource utilization requirements are met.
- Limits and requirements
- Stateful load balancing is not supported by MetalLB. An alternate load balancer implementation must be used if this is a requirement for workload CNFs.
- The networking infrastructure must ensure that the external IP address is routable from clients to the host network for the cluster.
- Engineering considerations
- MetalLB is used in BGP mode only for core use case models.
-
For core use models, MetalLB is supported with only the OVN-Kubernetes network provider used in local gateway mode. See
routingViaHost
in the "Cluster Network Operator" section. - BGP configuration in MetalLB varies depending on the requirements of the network and peers.
- Address pools can be configured as needed, allowing variation in addresses, aggregation length, auto assignment, and other relevant parameters.
- The values of parameters in the Bi-Directional Forwarding Detection (BFD) profile should remain close to the defaults. Shorter values might lead to false negatives and impact performance.
Additional resources
3.3.3.3.3. SR-IOV
- New in this release
-
MultiNetworkPolicy
resources can now be applied to SR-IOV networks to enforce network reachability policies.
-
- QinQ is now supported in the SR-IOV Network Operator. This is a Tech Preview feature.
SR-IOV VFs can now receive all multicast traffic via the ‘allmulti’ flag when tuning the CNI. This eliminates the need to add
NET_ADMIN
capability to the pod’s security context constraints (SCCs) and enhances security by minimizing potential vulnerabilities for pods.- Description
- SR-IOV enables physical network interfaces (PFs) to be divided into multiple virtual functions (VFs). VFs can then be assigned to multiple pods to achieve higher throughput performance while keeping the pods isolated. The SR-IOV Network Operator provisions and manages SR-IOV CNI, network device plugin, and other components of the SR-IOV stack.
- Limits and requirements
- The network interface controllers supported are listed in Supported devices
- SR-IOV and IOMMU enablement in BIOS: The SR-IOV Network Operator automatically enables IOMMU on the kernel command line.
- SR-IOV VFs do not receive link state updates from PF. If link down detection is needed, it must be done at the protocol level.
MultiNetworkPolicy
CRs can be applied tonetdevice
networks only. This is because the implementation uses theiptables
tool, which cannot managevfio
interfaces.- Engineering considerations
-
SR-IOV interfaces in
vfio
mode are typically used to enable additional secondary networks for applications that require high throughput or low latency.
Additional resources
3.3.3.3.4. NMState Operator
- New in this release
- No reference design updates in this release
- Description
- The NMState Operator provides a Kubernetes API for performing network configurations across the cluster’s nodes. It enables network interface configurations, static IPs and DNS, VLANs, trunks, bonding, static routes, MTU, and enabling promiscuous mode on the secondary interfaces. The cluster nodes periodically report on the state of each node’s network interfaces to the API server.
- Limits and requirements
- Not applicable
- Engineering considerations
-
The initial networking configuration is applied using
NMStateConfig
content in the installation CRs. The NMState Operator is used only when needed for network updates. -
When SR-IOV virtual functions are used for host networking, the NMState Operator using
NodeNetworkConfigurationPolicy
is used to configure those VF interfaces, for example, VLANs and the MTU.
-
The initial networking configuration is applied using
Additional resources
3.3.3.4. Logging
- New in this release
- No reference design updates in this release
- Description
- The Cluster Logging Operator enables collection and shipping of logs off the node for remote archival and analysis. The reference configuration ships audit and infrastructure logs to a remote archive by using Kafka.
- Limits and requirements
- Not applicable
- Engineering considerations
- The impact of cluster CPU use is based on the number or size of logs generated and the amount of log filtering configured.
- The reference configuration does not include shipping of application logs. Inclusion of application logs in the configuration requires evaluation of the application logging rate and sufficient additional CPU resources allocated to the reserved set.
Additional resources
3.3.3.5. Power Management
- New in this release
- No reference design updates in this release
- Description
The Performance Profile can be used to configure a cluster in a high power, low power, or mixed mode. The choice of power mode depends on the characteristics of the workloads running on the cluster, particularly how sensitive they are to latency. Configure the maximum latency for a low-latency pod by using the per-pod power management C-states feature.
For more information, see Configuring power saving for nodes.
- Limits and requirements
- Power configuration relies on appropriate BIOS configuration, for example, enabling C-states and P-states. Configuration varies between hardware vendors.
- Engineering considerations
-
Latency: To ensure that latency-sensitive workloads meet their requirements, you will need either a high-power configuration or a per-pod power management configuration. Per-pod power management is only available for
Guaranteed
QoS Pods with dedicated pinned CPUs.
-
Latency: To ensure that latency-sensitive workloads meet their requirements, you will need either a high-power configuration or a per-pod power management configuration. Per-pod power management is only available for
3.3.3.6. Storage
- Overview
Cloud native storage services can be provided by multiple solutions including OpenShift Data Foundation from Red Hat or third parties.
OpenShift Data Foundation is a Ceph based software-defined storage solution for containers. It provides block storage, file system storage, and on-premises object storage, which can be dynamically provisioned for both persistent and non-persistent data requirements. Telco core applications require persistent storage.
NoteAll storage data may not be encrypted in flight. To reduce risk, isolate the storage network from other cluster networks. The storage network must not be reachable, or routable, from other cluster networks. Only nodes directly attached to the storage network should be allowed to gain access to it.
3.3.3.6.1. OpenShift Data Foundation
- New in this release
- No reference design updates in this release
- Description
- Red Hat OpenShift Data Foundation is a software-defined storage service for containers. For Telco core clusters, storage support is provided by OpenShift Data Foundation storage services running externally to the application workload cluster. OpenShift Data Foundation supports separation of storage traffic using secondary CNI networks.
- Limits and requirements
- In an IPv4/IPv6 dual-stack networking environment, OpenShift Data Foundation uses IPv4 addressing. For more information, see Support OpenShift dual stack with OpenShift Data Foundation using IPv4.
- Engineering considerations
- OpenShift Data Foundation network traffic should be isolated from other traffic on a dedicated network, for example, by using VLAN isolation.
3.3.3.6.2. Other Storage
Other storage solutions can be used to provide persistent storage for core clusters. The configuration and integration of these solutions is outside the scope of the telco core RDS. Integration of the storage solution into the core cluster must include correct sizing and performance analysis to ensure the storage meets overall performance and resource utilization requirements.
Additional resources
3.3.3.7. Monitoring
- New in this release
- No reference design updates in this release
- Description
The Cluster Monitoring Operator (CMO) is included by default on all OpenShift clusters and provides monitoring (metrics, dashboards, and alerting) for the platform components and optionally user projects as well.
Configuration of the monitoring operator allows for customization, including:
- Default retention period
- Custom alert rules
The default handling of pod CPU and memory metrics is based on upstream Kubernetes
cAdvisor
and makes a tradeoff that prefers handling of stale data over metric accuracy. This leads to spiky data that will create false triggers of alerts over user-specified thresholds. OpenShift supports an opt-in dedicated service monitor feature creating an additional set of pod CPU and memory metrics that do not suffer from the spiky behavior. For additional information, see this solution guide.In addition to default configuration, the following metrics are expected to be configured for telco core clusters:
- Pod CPU and memory metrics and alerts for user workloads
- Limits and requirements
- Monitoring configuration must enable the dedicated service monitor feature for accurate representation of pod metrics
- Engineering considerations
- The Prometheus retention period is specified by the user. The value used is a tradeoff between operational requirements for maintaining historical data on the cluster against CPU and storage resources. Longer retention periods increase the need for storage and require additional CPU to manage the indexing of data.
Additional resources
3.3.3.8. Scheduling
- New in this release
- No reference design updates in this release
- Description
- The scheduler is a cluster-wide component responsible for selecting the right node for a given workload. It is a core part of the platform and does not require any specific configuration in the common deployment scenarios. However, there are few specific use cases described in the following section. NUMA-aware scheduling can be enabled through the NUMA Resources Operator. For more information, see Scheduling NUMA-aware workloads.
- Limits and requirements
The default scheduler does not understand the NUMA locality of workloads. It only knows about the sum of all free resources on a worker node. This might cause workloads to be rejected when scheduled to a node with Topology manager policy set to
single-numa-node
orrestricted
.- For example, consider a pod requesting 6 CPUs and being scheduled to an empty node that has 4 CPUs per NUMA node. The total allocatable capacity of the node is 8 CPUs and the scheduler will place the pod there. The node local admission will fail, however, as there are only 4 CPUs available in each of the NUMA nodes.
-
All clusters with multi-NUMA nodes are required to use the NUMA Resources Operator. The
machineConfigPoolSelector
of the NUMA Resources Operator must select all nodes where NUMA aligned scheduling is needed.
- All machine config pools must have consistent hardware configuration for example all nodes are expected to have the same NUMA zone count.
- Engineering considerations
- Pods might require annotations for correct scheduling and isolation. For more information on annotations, see CPU partitioning and performance tuning.
-
You can configure SR-IOV virtual function NUMA affinity to be ignored during scheduling by using the
excludeTopology
field inSriovNetworkNodePolicy
CR.
Additional resources
3.3.3.9. Installation
- New in this release, Description
Telco core clusters can be installed using the Agent Based Installer (ABI). This method allows users to install OpenShift Container Platform on bare metal servers without requiring additional servers or VMs for managing the installation. The ABI installer can be run on any system for example a laptop to generate an ISO installation image. This ISO is used as the installation media for the cluster supervisor nodes. Progress can be monitored using the ABI tool from any system with network connectivity to the supervisor node’s API interfaces.
- Installation from declarative CRs
- Does not require additional servers to support installation
- Supports install in disconnected environment
- Limits and requirements
- Disconnected installation requires a reachable registry with all required content mirrored.
- Engineering considerations
- Networking configuration should be applied as NMState configuration during installation in preference to day-2 configuration by using the NMState Operator.
Additional resources
Installing an OpenShift Container Platform cluster with the Agent-based Installer
3.3.3.10. Security
- New in this release
- No reference design updates in this release
- Description
Telco operators are security conscious and require clusters to be hardened against multiple attack vectors. Within OpenShift Container Platform, there is no single component or feature responsible for securing a cluster. This section provides details of security-oriented features and configuration for the use models covered in this specification.
- SecurityContextConstraints: All workload pods should be run with restricted-v2 or restricted SCC.
-
Seccomp: All pods should be run with the
RuntimeDefault
(or stronger) seccomp profile. - Rootless DPDK pods: Many user-plane networking (DPDK) CNFs require pods to run with root privileges. With this feature, a conformant DPDK pod can be run without requiring root privileges. Rootless DPDK pods create a tap device in a rootless pod that injects traffic from a DPDK application to the kernel.
- Storage: The storage network should be isolated and non-routable to other cluster networks. See the "Storage" section for additional details.
- Limits and requirements
Rootless DPDK pods requires the following additional configuration steps:
-
Configure the TAP plugin with the
container_t
SELinux context. -
Enable the
container_use_devices
SELinux boolean on the hosts.
-
Configure the TAP plugin with the
- Engineering considerations
-
For rootless DPDK pod support, the SELinux boolean
container_use_devices
must be enabled on the host for the TAP device to be created. This introduces a security risk that is acceptable for short to mid-term use. Other solutions will be explored.
-
For rootless DPDK pod support, the SELinux boolean
Additional resources
3.3.3.11. Scalability
- New in this release
- No reference design updates in this release
- Description
Clusters will scale to the sizing listed in the limits and requirements section.
Scaling of workloads is described in the use model section.
- Limits and requirements
- Cluster scales to at least 120 nodes
- Engineering considerations
- Not applicable
3.3.3.12. Additional configuration
3.3.3.12.1. Disconnected environment
- Description
Telco core clusters are expected to be installed in networks without direct access to the internet. All container images needed to install, configure, and operator the cluster must be available in a disconnected registry. This includes OpenShift Container Platform images, day-2 Operator Lifecycle Manager (OLM) Operator images, and application workload images. The use of a disconnected environment provides multiple benefits, for example:
- Limiting access to the cluster for security
- Curated content: The registry is populated based on curated and approved updates for the clusters
- Limits and requirements
- A unique name is required for all custom CatalogSources. Do not reuse the default catalog names.
- A valid time source must be configured as part of cluster installation.
- Engineering considerations
- Not applicable
Additional resources
3.3.3.12.2. Kernel
- New in this release
- No reference design updates in this release
- Description
The user can install the following kernel modules by using
MachineConfig
to provide extended kernel functionality to CNFs:- sctp
- ip_gre
- ip6_tables
- ip6t_REJECT
- ip6table_filter
- ip6table_mangle
- iptable_filter
- iptable_mangle
- iptable_nat
- xt_multiport
- xt_owner
- xt_REDIRECT
- xt_statistic
- xt_TCPMSS
- Limits and requirements
- Use of functionality available through these kernel modules must be analyzed by the user to determine the impact on CPU load, system performance, and ability to sustain KPI.
NoteOut of tree drivers are not supported.
- Engineering considerations
- Not applicable
3.3.4. Telco core 4.15 reference configuration CRs
Use the following custom resources (CRs) to configure and deploy OpenShift Container Platform clusters with the telco core profile. Use the CRs to form the common baseline used in all the specific use models unless otherwise indicated.
3.3.4.1. Resource Tuning reference CRs
Component | Reference CR | Optional | New in this release |
---|---|---|---|
System reserved capacity | Yes | No | |
System reserved capacity | Yes | No |
3.3.4.2. Storage reference CRs
Component | Reference CR | Optional | New in this release |
---|---|---|---|
External Red Hat OpenShift Data Foundation configuration | No | Yes | |
External OpenShift Data Foundation configuration | No | No | |
External OpenShift Data Foundation configuration | No | No | |
External OpenShift Data Foundation configuration | No | No |
3.3.4.3. Networking reference CRs
Component | Reference CR | Optional | New in this release |
---|---|---|---|
Baseline | No | No | |
Baseline | Yes | Yes | |
Load balancer | No | No | |
Load balancer | No | No | |
Load balancer | No | No | |
Load balancer | No | No | |
Load balancer | No | No | |
Load balancer | Yes | No | |
Load balancer | Yes | No | |
Load balancer | No | No | |
Multus - Tap CNI for rootless DPDK pod | No | No | |
SR-IOV Network Operator | Yes | No | |
SR-IOV Network Operator | No | Yes | |
SR-IOV Network Operator | No | Yes | |
SR-IOV Network Operator | No | No | |
SR-IOV Network Operator | No | No | |
SR-IOV Network Operator | No | No |
3.3.4.4. Scheduling reference CRs
Component | Reference CR | Optional | New in this release |
---|---|---|---|
NUMA-aware scheduler | No | No | |
NUMA-aware scheduler | No | No |
3.3.4.5. Other reference CRs
Component | Reference CR | Optional | New in this release |
---|---|---|---|
Additional kernel modules | Yes | No | |
Additional kernel modules | Yes | No | |
Additional kernel modules | Yes | No | |
Cluster logging | No | No | |
Cluster logging | No | No | |
Cluster logging | No | No | |
Cluster logging | No | No | |
Cluster logging | No | Yes | |
Disconnected configuration | No | No | |
Disconnected configuration | No | No | |
Disconnected configuration | No | No | |
Monitoring and observability | Yes | No | |
Power management | No | No |
3.3.4.6. YAML reference
3.3.4.6.1. Resource Tuning reference YAML
control-plane-system-reserved.yaml
# optional # count: 1 apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: autosizing-master spec: autoSizingReserved: true machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/master: ""
pid-limits-cr.yaml
# optional # count: 1 apiVersion: machineconfiguration.openshift.io/v1 kind: ContainerRuntimeConfig metadata: name: 99-change-pidslimit-custom spec: machineConfigPoolSelector: matchLabels: # Set to appropriate MCP pools.operator.machineconfiguration.openshift.io/master: "" containerRuntimeConfig: pidsLimit: $pidsLimit # Example: #pidsLimit: 4096
3.3.4.6.2. Storage reference YAML
01-rook-ceph-external-cluster-details.secret.yaml
# required # count: 1 --- apiVersion: v1 kind: Secret metadata: name: rook-ceph-external-cluster-details namespace: openshift-storage type: Opaque data: # encoded content has been made generic external_cluster_details: eyJuYW1lIjoicm9vay1jZXBoLW1vbi1lbmRwb2ludHMiLCJraW5kIjoiQ29uZmlnTWFwIiwiZGF0YSI6eyJkYXRhIjoiY2VwaHVzYTE9MS4yLjMuNDo2Nzg5IiwibWF4TW9uSWQiOiIwIiwibWFwcGluZyI6Int9In19LHsibmFtZSI6InJvb2stY2VwaC1tb24iLCJraW5kIjoiU2VjcmV0IiwiZGF0YSI6eyJhZG1pbi1zZWNyZXQiOiJhZG1pbi1zZWNyZXQiLCJmc2lkIjoiMTExMTExMTEtMTExMS0xMTExLTExMTEtMTExMTExMTExMTExIiwibW9uLXNlY3JldCI6Im1vbi1zZWNyZXQifX0seyJuYW1lIjoicm9vay1jZXBoLW9wZXJhdG9yLWNyZWRzIiwia2luZCI6IlNlY3JldCIsImRhdGEiOnsidXNlcklEIjoiY2xpZW50LmhlYWx0aGNoZWNrZXIiLCJ1c2VyS2V5IjoiYzJWamNtVjAifX0seyJuYW1lIjoibW9uaXRvcmluZy1lbmRwb2ludCIsImtpbmQiOiJDZXBoQ2x1c3RlciIsImRhdGEiOnsiTW9uaXRvcmluZ0VuZHBvaW50IjoiMS4yLjMuNCwxLjIuMy4zLDEuMi4zLjIiLCJNb25pdG9yaW5nUG9ydCI6IjkyODMifX0seyJuYW1lIjoiY2VwaC1yYmQiLCJraW5kIjoiU3RvcmFnZUNsYXNzIiwiZGF0YSI6eyJwb29sIjoib2RmX3Bvb2wifX0seyJuYW1lIjoicm9vay1jc2ktcmJkLW5vZGUiLCJraW5kIjoiU2VjcmV0IiwiZGF0YSI6eyJ1c2VySUQiOiJjc2ktcmJkLW5vZGUiLCJ1c2VyS2V5IjoiIn19LHsibmFtZSI6InJvb2stY3NpLXJiZC1wcm92aXNpb25lciIsImtpbmQiOiJTZWNyZXQiLCJkYXRhIjp7InVzZXJJRCI6ImNzaS1yYmQtcHJvdmlzaW9uZXIiLCJ1c2VyS2V5IjoiYzJWamNtVjAifX0seyJuYW1lIjoicm9vay1jc2ktY2VwaGZzLXByb3Zpc2lvbmVyIiwia2luZCI6IlNlY3JldCIsImRhdGEiOnsiYWRtaW5JRCI6ImNzaS1jZXBoZnMtcHJvdmlzaW9uZXIiLCJhZG1pbktleSI6IiJ9fSx7Im5hbWUiOiJyb29rLWNzaS1jZXBoZnMtbm9kZSIsImtpbmQiOiJTZWNyZXQiLCJkYXRhIjp7ImFkbWluSUQiOiJjc2ktY2VwaGZzLW5vZGUiLCJhZG1pbktleSI6ImMyVmpjbVYwIn19LHsibmFtZSI6ImNlcGhmcyIsImtpbmQiOiJTdG9yYWdlQ2xhc3MiLCJkYXRhIjp7ImZzTmFtZSI6ImNlcGhmcyIsInBvb2wiOiJtYW5pbGFfZGF0YSJ9fQ==
02-ocs-external-storagecluster.yaml
# required # count: 1 --- apiVersion: ocs.openshift.io/v1 kind: StorageCluster metadata: name: ocs-external-storagecluster namespace: openshift-storage spec: externalStorage: enable: true labelSelector: {}
odfNS.yaml
# required: yes # count: 1 --- apiVersion: v1 kind: Namespace metadata: name: openshift-storage annotations: workload.openshift.io/allowed: management labels: openshift.io/cluster-monitoring: "true"
odfOperGroup.yaml
# required: yes # count: 1 --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: openshift-storage-operatorgroup namespace: openshift-storage spec: targetNamespaces: - openshift-storage
3.3.4.6.3. Networking reference YAML
Network.yaml
# required # count: 1 apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: defaultNetwork: ovnKubernetesConfig: gatewayConfig: routingViaHost: true # additional networks are optional and may alternatively be specified using NetworkAttachmentDefinition CRs additionalNetworks: [$additionalNetworks] # eg #- name: add-net-1 # namespace: app-ns-1 # rawCNIConfig: '{ "cniVersion": "0.3.1", "name": "add-net-1", "plugins": [{"type": "macvlan", "master": "bond1", "ipam": {}}] }' # type: Raw #- name: add-net-2 # namespace: app-ns-1 # rawCNIConfig: '{ "cniVersion": "0.4.0", "name": "add-net-2", "plugins": [ {"type": "macvlan", "master": "bond1", "mode": "private" },{ "type": "tuning", "name": "tuning-arp" }] }' # type: Raw # Enable to use MultiNetworkPolicy CRs useMultiNetworkPolicy: true
networkAttachmentDefinition.yaml
# optional # copies: 0-N apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: $name namespace: $ns spec: nodeSelector: kubernetes.io/hostname: $nodeName config: $config #eg #config: '{ # "cniVersion": "0.3.1", # "name": "external-169", # "type": "vlan", # "master": "ens8f0", # "mode": "bridge", # "vlanid": 169, # "ipam": { # "type": "static", # } #}'
addr-pool.yaml
# required # count: 1-N apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: $name # eg addresspool3 namespace: metallb-system annotations: metallb.universe.tf/address-pool: $name # eg addresspool3 spec: ############## # Expected variation in this configuration addresses: [$pools] #- 3.3.3.0/24 autoAssign: true ##############
bfd-profile.yaml
# required # count: 1-N apiVersion: metallb.io/v1beta1 kind: BFDProfile metadata: name: bfdprofile namespace: metallb-system spec: ################ # These values may vary. Recommended values are included as default receiveInterval: 150 # default 300ms transmitInterval: 150 # default 300ms #echoInterval: 300 # default 50ms detectMultiplier: 10 # default 3 echoMode: true passiveMode: true minimumTtl: 5 # default 254 # ################
bgp-advr.yaml
# required # count: 1-N apiVersion: metallb.io/v1beta1 kind: BGPAdvertisement metadata: name: $name # eg bgpadvertisement-1 namespace: metallb-system spec: ipAddressPools: [$pool] # eg: # - addresspool3 peers: [$peers] # eg: # - peer-one communities: [$communities] # Note correlation with address pool. # eg: # - 65535:65282 aggregationLength: 32 aggregationLengthV6: 128 localPref: 100
bgp-peer.yaml
# required # count: 1-N apiVersion: metallb.io/v1beta1 kind: BGPPeer metadata: name: $name namespace: metallb-system spec: peerAddress: $ip # eg 192.168.1.2 peerASN: $peerasn # eg 64501 myASN: $myasn # eg 64500 routerID: $id # eg 10.10.10.10 bfdProfile: bfdprofile
metallb.yaml
# required # count: 1 apiVersion: metallb.io/v1beta1 kind: MetalLB metadata: name: metallb namespace: metallb-system spec: nodeSelector: node-role.kubernetes.io/worker: ""
metallbNS.yaml
# required: yes # count: 1 --- apiVersion: v1 kind: Namespace metadata: name: metallb-system annotations: workload.openshift.io/allowed: management labels: openshift.io/cluster-monitoring: "true"
metallbOperGroup.yaml
# required: yes # count: 1 --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: metallb-operator namespace: metallb-system
metallbSubscription.yaml
# required: yes # count: 1 --- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: metallb-operator-sub namespace: metallb-system spec: channel: stable name: metallb-operator source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Automatic
mc_rootless_pods_selinux.yaml
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 99-worker-setsebool spec: config: ignition: version: 3.2.0 systemd: units: - contents: | [Unit] Description=Set SELinux boolean for tap cni plugin Before=kubelet.service [Service] Type=oneshot ExecStart=/sbin/setsebool container_use_devices=on RemainAfterExit=true [Install] WantedBy=multi-user.target graphical.target enabled: true name: setsebool.service
sriovNetwork.yaml
# optional (though expected for all) # count: 0-N apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: $name # eg sriov-network-abcd namespace: openshift-sriov-network-operator spec: capabilities: "$capabilities" # eg '{"mac": true, "ips": true}' ipam: "$ipam" # eg '{ "type": "host-local", "subnet": "10.3.38.0/24" }' networkNamespace: $nns # eg cni-test resourceName: $resource # eg resourceTest
sriovNetworkNodePolicy.yaml
# optional (though expected in all deployments) # count: 0-N apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: $name namespace: openshift-sriov-network-operator spec: {} # $spec # eg #deviceType: netdevice #nicSelector: # deviceID: "1593" # pfNames: # - ens8f0np0#0-9 # rootDevices: # - 0000:d8:00.0 # vendor: "8086" #nodeSelector: # kubernetes.io/hostname: host.sample.lab #numVfs: 20 #priority: 99 #excludeTopology: true #resourceName: resourceNameABCD
SriovOperatorConfig.yaml
# required # count: 1 --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: configDaemonNodeSelector: node-role.kubernetes.io/worker: "" enableInjector: true enableOperatorWebhook: true
SriovSubscription.yaml
# required: yes # count: 1 apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subscription namespace: openshift-sriov-network-operator spec: channel: "stable" name: sriov-network-operator source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Automatic
SriovSubscriptionNS.yaml
# required: yes # count: 1 apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator annotations: workload.openshift.io/allowed: management
SriovSubscriptionOperGroup.yaml
# required: yes # count: 1 apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator spec: targetNamespaces: - openshift-sriov-network-operator
3.3.4.6.4. Scheduling reference YAML
nrop.yaml
# Optional # count: 1 apiVersion: nodetopology.openshift.io/v1 kind: NUMAResourcesOperator metadata: name: numaresourcesoperator spec: nodeGroups: - config: # Periodic is the default setting infoRefreshMode: Periodic machineConfigPoolSelector: matchLabels: # This label must match the pool(s) you want to run NUMA-aligned workloads pools.operator.machineconfiguration.openshift.io/worker: ""
sched.yaml
# Optional # count: 1 apiVersion: nodetopology.openshift.io/v1 kind: NUMAResourcesScheduler metadata: name: numaresourcesscheduler spec: #cacheResyncPeriod: "0" # Image spec should be the latest for the release imageSpec: "registry.redhat.io/openshift4/noderesourcetopology-scheduler-rhel9:v4.14.0" #logLevel: "Trace" schedulerName: topo-aware-scheduler
3.3.4.6.5. Other reference YAML
control-plane-load-kernel-modules.yaml
# optional # count: 1 apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 40-load-kernel-modules-control-plane spec: config: # Release info found in https://github.com/coreos/butane/releases ignition: version: 3.2.0 storage: files: - contents: source: data:, mode: 420 overwrite: true path: /etc/modprobe.d/kernel-blacklist.conf - contents: source: data:text/plain;charset=utf-8;base64,aXBfZ3JlCmlwNl90YWJsZXMKaXA2dF9SRUpFQ1QKaXA2dGFibGVfZmlsdGVyCmlwNnRhYmxlX21hbmdsZQppcHRhYmxlX2ZpbHRlcgppcHRhYmxlX21hbmdsZQppcHRhYmxlX25hdAp4dF9tdWx0aXBvcnQKeHRfb3duZXIKeHRfUkVESVJFQ1QKeHRfc3RhdGlzdGljCnh0X1RDUE1TUwp4dF91MzI= mode: 420 overwrite: true path: /etc/modules-load.d/kernel-load.conf
sctp_module_mc.yaml
# optional # count: 1 apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: load-sctp-module spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:, verification: {} filesystem: root mode: 420 path: /etc/modprobe.d/sctp-blacklist.conf - contents: source: data:text/plain;charset=utf-8;base64,c2N0cA== filesystem: root mode: 420 path: /etc/modules-load.d/sctp-load.conf
worker-load-kernel-modules.yaml
# optional # count: 1 apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 40-load-kernel-modules-worker spec: config: # Release info found in https://github.com/coreos/butane/releases ignition: version: 3.2.0 storage: files: - contents: source: data:, mode: 420 overwrite: true path: /etc/modprobe.d/kernel-blacklist.conf - contents: source: data:text/plain;charset=utf-8;base64,aXBfZ3JlCmlwNl90YWJsZXMKaXA2dF9SRUpFQ1QKaXA2dGFibGVfZmlsdGVyCmlwNnRhYmxlX21hbmdsZQppcHRhYmxlX2ZpbHRlcgppcHRhYmxlX21hbmdsZQppcHRhYmxlX25hdAp4dF9tdWx0aXBvcnQKeHRfb3duZXIKeHRfUkVESVJFQ1QKeHRfc3RhdGlzdGljCnh0X1RDUE1TUwp4dF91MzI= mode: 420 overwrite: true path: /etc/modules-load.d/kernel-load.conf
ClusterLogForwarder.yaml
# required # count: 1 apiVersion: logging.openshift.io/v1 kind: ClusterLogForwarder metadata: name: instance namespace: openshift-logging spec: outputs: - type: "kafka" name: kafka-open url: tcp://10.11.12.13:9092/test pipelines: - inputRefs: - infrastructure #- application - audit labels: label1: test1 label2: test2 label3: test3 label4: test4 label5: test5 name: all-to-default outputRefs: - kafka-open
ClusterLogging.yaml
# required # count: 1 apiVersion: logging.openshift.io/v1 kind: ClusterLogging metadata: name: instance namespace: openshift-logging spec: collection: type: vector managementState: Managed
ClusterLogNS.yaml
--- apiVersion: v1 kind: Namespace metadata: name: openshift-logging annotations: workload.openshift.io/allowed: management
ClusterLogOperGroup.yaml
--- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: cluster-logging namespace: openshift-logging spec: targetNamespaces: - openshift-logging
ClusterLogSubscription.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: cluster-logging namespace: openshift-logging spec: channel: "stable" name: cluster-logging source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Automatic
catalog-source.yaml
# required # count: 1..N apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: redhat-operators-disconnected namespace: openshift-marketplace spec: displayName: Red Hat Disconnected Operators Catalog image: $imageUrl publisher: Red Hat sourceType: grpc # updateStrategy: # registryPoll: # interval: 1h #status: # connectionState: # lastObservedState: READY
icsp.yaml
# required # count: 1 apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: name: disconnected-internal-icsp spec: repositoryDigestMirrors: [] # - $mirrors
operator-hub.yaml
# required # count: 1 apiVersion: config.openshift.io/v1 kind: OperatorHub metadata: name: cluster spec: disableAllDefaultSources: true
monitoring-config-cm.yaml
# optional # count: 1 --- apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | k8sPrometheusAdapter: dedicatedServiceMonitors: enabled: true prometheusK8s: retention: 15d volumeClaimTemplate: spec: storageClassName: ocs-external-storagecluster-ceph-rbd resources: requests: storage: 100Gi alertmanagerMain: volumeClaimTemplate: spec: storageClassName: ocs-external-storagecluster-ceph-rbd resources: requests: storage: 20Gi
PerformanceProfile.yaml
# required # count: 1 apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: $name annotations: # Some pods want the kernel stack to ignore IPv6 router Advertisement. kubeletconfig.experimental: | {"allowedUnsafeSysctls":["net.ipv6.conf.all.accept_ra"]} spec: cpu: # node0 CPUs: 0-17,36-53 # node1 CPUs: 18-34,54-71 # siblings: (0,36), (1,37)... # we want to reserve the first Core of each NUMA socket # # no CPU left behind! all-cpus == isolated + reserved isolated: $isolated # eg 1-17,19-35,37-53,55-71 reserved: $reserved # eg 0,18,36,54 # Guaranteed QoS pods will disable IRQ balancing for cores allocated to the pod. # default value of globallyDisableIrqLoadBalancing is false globallyDisableIrqLoadBalancing: false hugepages: defaultHugepagesSize: 1G pages: # 32GB per numa node - count: $count # eg 64 size: 1G machineConfigPoolSelector: # For SNO: machineconfiguration.openshift.io/role: 'master' pools.operator.machineconfiguration.openshift.io/worker: '' nodeSelector: # For SNO: node-role.kubernetes.io/master: "" node-role.kubernetes.io/worker: "" workloadHints: realTime: false highPowerConsumption: false perPodPowerManagement: true realTimeKernel: enabled: false numa: # All guaranteed QoS containers get resources from a single NUMA node topologyPolicy: "single-numa-node" net: userLevelNetworking: false
Chapter 4. Planning your environment according to object maximums
Consider the following tested object maximums when you plan your OpenShift Container Platform cluster.
These guidelines are based on the largest possible cluster. For smaller clusters, the maximums are lower. There are many factors that influence the stated thresholds, including the etcd version or storage data format.
In most cases, exceeding these numbers results in lower overall performance. It does not necessarily mean that the cluster will fail.
Clusters that experience rapid change, such as those with many starting and stopping pods, can have a lower practical maximum size than documented.
4.1. OpenShift Container Platform tested cluster maximums for major releases
Red Hat does not provide direct guidance on sizing your OpenShift Container Platform cluster. This is because determining whether your cluster is within the supported bounds of OpenShift Container Platform requires careful consideration of all the multidimensional factors that limit the cluster scale.
OpenShift Container Platform supports tested cluster maximums rather than absolute cluster maximums. Not every combination of OpenShift Container Platform version, control plane workload, and network plugin are tested, so the following table does not represent an absolute expectation of scale for all deployments. It might not be possible to scale to a maximum on all dimensions simultaneously. The table contains tested maximums for specific workload and deployment configurations, and serves as a scale guide as to what can be expected with similar deployments.
Maximum type | 4.x tested maximum |
---|---|
Number of nodes | 2,000 [1] |
Number of pods [2] | 150,000 |
Number of pods per node | 2,500 [3][4] |
Number of pods per core | There is no default value. |
Number of namespaces [5] | 10,000 |
Number of builds | 10,000 (Default pod RAM 512 Mi) - Source-to-Image (S2I) build strategy |
Number of pods per namespace [6] | 25,000 |
Number of routes and back ends per Ingress Controller | 2,000 per router |
Number of secrets | 80,000 |
Number of config maps | 90,000 |
Number of services [7] | 10,000 |
Number of services per namespace | 5,000 |
Number of back-ends per service | 5,000 |
Number of deployments per namespace [6] | 2,000 |
Number of build configs | 12,000 |
Number of custom resource definitions (CRD) | 1,024 [8] |
- Pause pods were deployed to stress the control plane components of OpenShift Container Platform at 2000 node scale. The ability to scale to similar numbers will vary depending upon specific deployment and workload parameters.
- The pod count displayed here is the number of test pods. The actual number of pods depends on the application’s memory, CPU, and storage requirements.
-
This was tested on a cluster with 31 servers: 3 control planes, 2 infrastructure nodes, and 26 worker nodes. If you need 2,500 user pods, you need both a
hostPrefix
of20
, which allocates a network large enough for each node to contain more than 2000 pods, and a custom kubelet config withmaxPods
set to2500
. For more information, see Running 2500 pods per node on OCP 4.13. -
The maximum tested pods per node is 2,500 for clusters using the
OVNKubernetes
network plugin. The maximum tested pods per node for theOpenShiftSDN
network plugin is 500 pods. - When there are a large number of active projects, etcd might suffer from poor performance if the keyspace grows excessively large and exceeds the space quota. Periodic maintenance of etcd, including defragmentation, is highly recommended to free etcd storage.
- There are several control loops in the system that must iterate over all objects in a given namespace as a reaction to some changes in state. Having a large number of objects of a given type in a single namespace can make those loops expensive and slow down processing given state changes. The limit assumes that the system has enough CPU, memory, and disk to satisfy the application requirements.
-
Each service port and each service back-end has a corresponding entry in
iptables
. The number of back-ends of a given service impact the size of theEndpoints
objects, which impacts the size of data that is being sent all over the system. -
Tested on a cluster with 29 servers: 3 control planes, 2 infrastructure nodes, and 24 worker nodes. The cluster had 500 namespaces. OpenShift Container Platform has a limit of 1,024 total custom resource definitions (CRD), including those installed by OpenShift Container Platform, products integrating with OpenShift Container Platform and user-created CRDs. If there are more than 1,024 CRDs created, then there is a possibility that
oc
command requests might be throttled.
4.1.1. Example scenario
As an example, 500 worker nodes (m5.2xl) were tested, and are supported, using OpenShift Container Platform 4.15, the OVN-Kubernetes network plugin, and the following workload objects:
- 200 namespaces, in addition to the defaults
- 60 pods per node; 30 server and 30 client pods (30k total)
- 57 image streams/ns (11.4k total)
- 15 services/ns backed by the server pods (3k total)
- 15 routes/ns backed by the previous services (3k total)
- 20 secrets/ns (4k total)
- 10 config maps/ns (2k total)
- 6 network policies/ns, including deny-all, allow-from ingress and intra-namespace rules
- 57 builds/ns
The following factors are known to affect cluster workload scaling, positively or negatively, and should be factored into the scale numbers when planning a deployment. For additional information and guidance, contact your sales representative or Red Hat support.
- Number of pods per node
- Number of containers per pod
- Type of probes used (for example, liveness/readiness, exec/http)
- Number of network policies
- Number of projects, or namespaces
- Number of image streams per project
- Number of builds per project
- Number of services/endpoints and type
- Number of routes
- Number of shards
- Number of secrets
- Number of config maps
Rate of API calls, or the cluster “churn”, which is an estimation of how quickly things change in the cluster configuration.
-
Prometheus query for pod creation requests per second over 5 minute windows:
sum(irate(apiserver_request_count{resource="pods",verb="POST"}[5m]))
-
Prometheus query for all API requests per second over 5 minute windows:
sum(irate(apiserver_request_count{}[5m]))
-
Prometheus query for pod creation requests per second over 5 minute windows:
- Cluster node resource consumption of CPU
- Cluster node resource consumption of memory
4.2. OpenShift Container Platform environment and configuration on which the cluster maximums are tested
4.2.1. AWS cloud platform
Node | Flavor | vCPU | RAM(GiB) | Disk type | Disk size(GiB)/IOS | Count | Region |
---|---|---|---|---|---|---|---|
Control plane/etcd [1] | r5.4xlarge | 16 | 128 | gp3 | 220 | 3 | us-west-2 |
Infra [2] | m5.12xlarge | 48 | 192 | gp3 | 100 | 3 | us-west-2 |
Workload [3] | m5.4xlarge | 16 | 64 | gp3 | 500 [4] | 1 | us-west-2 |
Compute | m5.2xlarge | 8 | 32 | gp3 | 100 | 3/25/250/500 [5] | us-west-2 |
- gp3 disks with a baseline performance of 3000 IOPS and 125 MiB per second are used for control plane/etcd nodes because etcd is latency sensitive. gp3 volumes do not use burst performance.
- Infra nodes are used to host Monitoring, Ingress, and Registry components to ensure they have enough resources to run at large scale.
- Workload node is dedicated to run performance and scalability workload generators.
- Larger disk size is used so that there is enough space to store the large amounts of data that is collected during the performance and scalability test run.
- Cluster is scaled in iterations and performance and scalability tests are executed at the specified node counts.
4.2.2. IBM Power platform
Node | vCPU | RAM(GiB) | Disk type | Disk size(GiB)/IOS | Count |
---|---|---|---|---|---|
Control plane/etcd [1] | 16 | 32 | io1 | 120 / 10 IOPS per GiB | 3 |
Infra [2] | 16 | 64 | gp2 | 120 | 2 |
Workload [3] | 16 | 256 | gp2 | 120 [4] | 1 |
Compute | 16 | 64 | gp2 | 120 | 2 to 100 [5] |
- io1 disks with 120 / 10 IOPS per GiB are used for control plane/etcd nodes as etcd is I/O intensive and latency sensitive.
- Infra nodes are used to host Monitoring, Ingress, and Registry components to ensure they have enough resources to run at large scale.
- Workload node is dedicated to run performance and scalability workload generators.
- Larger disk size is used so that there is enough space to store the large amounts of data that is collected during the performance and scalability test run.
- Cluster is scaled in iterations.
4.2.3. IBM Z platform
Node | vCPU [4] | RAM(GiB)[5] | Disk type | Disk size(GiB)/IOS | Count |
---|---|---|---|---|---|
Control plane/etcd [1,2] | 8 | 32 | ds8k | 300 / LCU 1 | 3 |
Compute [1,3] | 8 | 32 | ds8k | 150 / LCU 2 | 4 nodes (scaled to 100/250/500 pods per node) |
- Nodes are distributed between two logical control units (LCUs) to optimize disk I/O load of the control plane/etcd nodes as etcd is I/O intensive and latency sensitive. Etcd I/O demand should not interfere with other workloads.
- Four compute nodes are used for the tests running several iterations with 100/250/500 pods at the same time. First, idling pods were used to evaluate if pods can be instanced. Next, a network and CPU demanding client/server workload were used to evaluate the stability of the system under stress. Client and server pods were pairwise deployed and each pair was spread over two compute nodes.
- No separate workload node was used. The workload simulates a microservice workload between two compute nodes.
- Physical number of processors used is six Integrated Facilities for Linux (IFLs).
- Total physical memory used is 512 GiB.
4.3. How to plan your environment according to tested cluster maximums
Oversubscribing the physical resources on a node affects resource guarantees the Kubernetes scheduler makes during pod placement. Learn what measures you can take to avoid memory swapping.
Some of the tested maximums are stretched only in a single dimension. They will vary when many objects are running on the cluster.
The numbers noted in this documentation are based on Red Hat’s test methodology, setup, configuration, and tunings. These numbers can vary based on your own individual setup and environments.
While planning your environment, determine how many pods are expected to fit per node:
required pods per cluster / pods per node = total number of nodes needed
The default maximum number of pods per node is 250. However, the number of pods that fit on a node is dependent on the application itself. Consider the application’s memory, CPU, and storage requirements, as described in "How to plan your environment according to application requirements".
Example scenario
If you want to scope your cluster for 2200 pods per cluster, you would need at least five nodes, assuming that there are 500 maximum pods per node:
2200 / 500 = 4.4
If you increase the number of nodes to 20, then the pod distribution changes to 110 pods per node:
2200 / 20 = 110
Where:
required pods per cluster / total number of nodes = expected pods per node
OpenShift Container Platform comes with several system pods, such as SDN, DNS, Operators, and others, which run across every worker node by default. Therefore, the result of the above formula can vary.
4.4. How to plan your environment according to application requirements
Consider an example application environment:
Pod type | Pod quantity | Max memory | CPU cores | Persistent storage |
---|---|---|---|---|
apache | 100 | 500 MB | 0.5 | 1 GB |
node.js | 200 | 1 GB | 1 | 1 GB |
postgresql | 100 | 1 GB | 2 | 10 GB |
JBoss EAP | 100 | 1 GB | 1 | 1 GB |
Extrapolated requirements: 550 CPU cores, 450GB RAM, and 1.4TB storage.
Instance size for nodes can be modulated up or down, depending on your preference. Nodes are often resource overcommitted. In this deployment scenario, you can choose to run additional smaller nodes or fewer larger nodes to provide the same amount of resources. Factors such as operational agility and cost-per-instance should be considered.
Node type | Quantity | CPUs | RAM (GB) |
---|---|---|---|
Nodes (option 1) | 100 | 4 | 16 |
Nodes (option 2) | 50 | 8 | 32 |
Nodes (option 3) | 25 | 16 | 64 |
Some applications lend themselves well to overcommitted environments, and some do not. Most Java applications and applications that use huge pages are examples of applications that would not allow for overcommitment. That memory can not be used for other applications. In the example above, the environment would be roughly 30 percent overcommitted, a common ratio.
The application pods can access a service either by using environment variables or DNS. If using environment variables, for each active service the variables are injected by the kubelet when a pod is run on a node. A cluster-aware DNS server watches the Kubernetes API for new services and creates a set of DNS records for each one. If DNS is enabled throughout your cluster, then all pods should automatically be able to resolve services by their DNS name. Service discovery using DNS can be used in case you must go beyond 5000 services. When using environment variables for service discovery, the argument list exceeds the allowed length after 5000 services in a namespace, then the pods and deployments will start failing. Disable the service links in the deployment’s service specification file to overcome this:
--- apiVersion: template.openshift.io/v1 kind: Template metadata: name: deployment-config-template creationTimestamp: annotations: description: This template will create a deploymentConfig with 1 replica, 4 env vars and a service. tags: '' objects: - apiVersion: apps.openshift.io/v1 kind: DeploymentConfig metadata: name: deploymentconfig${IDENTIFIER} spec: template: metadata: labels: name: replicationcontroller${IDENTIFIER} spec: enableServiceLinks: false containers: - name: pause${IDENTIFIER} image: "${IMAGE}" ports: - containerPort: 8080 protocol: TCP env: - name: ENVVAR1_${IDENTIFIER} value: "${ENV_VALUE}" - name: ENVVAR2_${IDENTIFIER} value: "${ENV_VALUE}" - name: ENVVAR3_${IDENTIFIER} value: "${ENV_VALUE}" - name: ENVVAR4_${IDENTIFIER} value: "${ENV_VALUE}" resources: {} imagePullPolicy: IfNotPresent capabilities: {} securityContext: capabilities: {} privileged: false restartPolicy: Always serviceAccount: '' replicas: 1 selector: name: replicationcontroller${IDENTIFIER} triggers: - type: ConfigChange strategy: type: Rolling - apiVersion: v1 kind: Service metadata: name: service${IDENTIFIER} spec: selector: name: replicationcontroller${IDENTIFIER} ports: - name: serviceport${IDENTIFIER} protocol: TCP port: 80 targetPort: 8080 clusterIP: '' type: ClusterIP sessionAffinity: None status: loadBalancer: {} parameters: - name: IDENTIFIER description: Number to append to the name of resources value: '1' required: true - name: IMAGE description: Image to use for deploymentConfig value: gcr.io/google-containers/pause-amd64:3.0 required: false - name: ENV_VALUE description: Value to use for environment variables generate: expression from: "[A-Za-z0-9]{255}" required: false labels: template: deployment-config-template
The number of application pods that can run in a namespace is dependent on the number of services and the length of the service name when the environment variables are used for service discovery. ARG_MAX
on the system defines the maximum argument length for a new process and it is set to 2097152 bytes (2 MiB) by default. The Kubelet injects environment variables in to each pod scheduled to run in the namespace including:
-
<SERVICE_NAME>_SERVICE_HOST=<IP>
-
<SERVICE_NAME>_SERVICE_PORT=<PORT>
-
<SERVICE_NAME>_PORT=tcp://<IP>:<PORT>
-
<SERVICE_NAME>_PORT_<PORT>_TCP=tcp://<IP>:<PORT>
-
<SERVICE_NAME>_PORT_<PORT>_TCP_PROTO=tcp
-
<SERVICE_NAME>_PORT_<PORT>_TCP_PORT=<PORT>
-
<SERVICE_NAME>_PORT_<PORT>_TCP_ADDR=<ADDR>
The pods in the namespace will start to fail if the argument length exceeds the allowed value and the number of characters in a service name impacts it. For example, in a namespace with 5000 services, the limit on the service name is 33 characters, which enables you to run 5000 pods in the namespace.
Chapter 5. Using quotas and limit ranges
A resource quota, defined by a ResourceQuota
object, provides constraints that limit aggregate resource consumption per project. It can limit the quantity of objects that can be created in a project by type, as well as the total amount of compute resources and storage that may be consumed by resources in that project.
Using quotas and limit ranges, cluster administrators can set constraints to limit the number of objects or amount of compute resources that are used in your project. This helps cluster administrators better manage and allocate resources across all projects, and ensure that no projects are using more than is appropriate for the cluster size.
Quotas are set by cluster administrators and are scoped to a given project. OpenShift Container Platform project owners can change quotas for their project, but not limit ranges. OpenShift Container Platform users cannot modify quotas or limit ranges.
The following sections help you understand how to check on your quota and limit range settings, what sorts of things they can constrain, and how you can request or limit compute resources in your own pods and containers.
5.1. Resources managed by quota
A resource quota, defined by a ResourceQuota
object, provides constraints that limit aggregate resource consumption per project. It can limit the quantity of objects that can be created in a project by type, as well as the total amount of compute resources and storage that may be consumed by resources in that project.
The following describes the set of compute resources and object types that may be managed by a quota.
A pod is in a terminal state if status.phase
is Failed
or Succeeded
.
Resource Name | Description |
---|---|
|
The sum of CPU requests across all pods in a non-terminal state cannot exceed this value. |
|
The sum of memory requests across all pods in a non-terminal state cannot exceed this value. |
|
The sum of local ephemeral storage requests across all pods in a non-terminal state cannot exceed this value. |
|
The sum of CPU requests across all pods in a non-terminal state cannot exceed this value. |
|
The sum of memory requests across all pods in a non-terminal state cannot exceed this value. |
|
The sum of ephemeral storage requests across all pods in a non-terminal state cannot exceed this value. |
| The sum of CPU limits across all pods in a non-terminal state cannot exceed this value. |
| The sum of memory limits across all pods in a non-terminal state cannot exceed this value. |
| The sum of ephemeral storage limits across all pods in a non-terminal state cannot exceed this value. This resource is available only if you enabled the ephemeral storage technology preview. This feature is disabled by default. |
Resource Name | Description |
---|---|
| The sum of storage requests across all persistent volume claims in any state cannot exceed this value. |
| The total number of persistent volume claims that can exist in the project. |
| The sum of storage requests across all persistent volume claims in any state that have a matching storage class, cannot exceed this value. |
| The total number of persistent volume claims with a matching storage class that can exist in the project. |
Resource Name | Description |
---|---|
| The total number of pods in a non-terminal state that can exist in the project. |
| The total number of replication controllers that can exist in the project. |
| The total number of resource quotas that can exist in the project. |
| The total number of services that can exist in the project. |
| The total number of secrets that can exist in the project. |
|
The total number of |
| The total number of persistent volume claims that can exist in the project. |
| The total number of image streams that can exist in the project. |
You can configure an object count quota for these standard namespaced resource types using the count/<resource>.<group>
syntax.
$ oc create quota <name> --hard=count/<resource>.<group>=<quota> 1
- 1
<resource>
is the name of the resource, and<group>
is the API group, if applicable. Use thekubectl api-resources
command for a list of resources and their associated API groups.
5.1.1. Setting resource quota for extended resources
Overcommitment of resources is not allowed for extended resources, so you must specify requests
and limits
for the same extended resource in a quota. Currently, only quota items with the prefix requests.
are allowed for extended resources. The following is an example scenario of how to set resource quota for the GPU resource nvidia.com/gpu
.
Procedure
To determine how many GPUs are available on a node in your cluster, use the following command:
$ oc describe node ip-172-31-27-209.us-west-2.compute.internal | egrep 'Capacity|Allocatable|gpu'
Example output
openshift.com/gpu-accelerator=true Capacity: nvidia.com/gpu: 2 Allocatable: nvidia.com/gpu: 2 nvidia.com/gpu: 0 0
In this example, 2 GPUs are available.
Use this command to set a quota in the namespace
nvidia
. In this example, the quota is1
:$ cat gpu-quota.yaml
Example output
apiVersion: v1 kind: ResourceQuota metadata: name: gpu-quota namespace: nvidia spec: hard: requests.nvidia.com/gpu: 1
Create the quota with the following command:
$ oc create -f gpu-quota.yaml
Example output
resourcequota/gpu-quota created
Verify that the namespace has the correct quota set using the following command:
$ oc describe quota gpu-quota -n nvidia
Example output
Name: gpu-quota Namespace: nvidia Resource Used Hard -------- ---- ---- requests.nvidia.com/gpu 0 1
Run a pod that asks for a single GPU with the following command:
$ oc create pod gpu-pod.yaml
Example output
apiVersion: v1 kind: Pod metadata: generateName: gpu-pod-s46h7 namespace: nvidia spec: restartPolicy: OnFailure containers: - name: rhel7-gpu-pod image: rhel7 env: - name: NVIDIA_VISIBLE_DEVICES value: all - name: NVIDIA_DRIVER_CAPABILITIES value: "compute,utility" - name: NVIDIA_REQUIRE_CUDA value: "cuda>=5.0" command: ["sleep"] args: ["infinity"] resources: limits: nvidia.com/gpu: 1
Verify that the pod is running bwith the following command:
$ oc get pods
Example output
NAME READY STATUS RESTARTS AGE gpu-pod-s46h7 1/1 Running 0 1m
Verify that the quota
Used
counter is correct by running the following command:$ oc describe quota gpu-quota -n nvidia
Example output
Name: gpu-quota Namespace: nvidia Resource Used Hard -------- ---- ---- requests.nvidia.com/gpu 1 1
Using the following command, attempt to create a second GPU pod in the
nvidia
namespace. This is technically available on the node because it has 2 GPUs:$ oc create -f gpu-pod.yaml
Example output
Error from server (Forbidden): error when creating "gpu-pod.yaml": pods "gpu-pod-f7z2w" is forbidden: exceeded quota: gpu-quota, requested: requests.nvidia.com/gpu=1, used: requests.nvidia.com/gpu=1, limited: requests.nvidia.com/gpu=1
This
Forbidden
error message occurs because you have a quota of 1 GPU and this pod tried to allocate a second GPU, which exceeds its quota.
5.1.2. Quota scopes
Each quota can have an associated set of scopes. A quota only measures usage for a resource if it matches the intersection of enumerated scopes.
Adding a scope to a quota restricts the set of resources to which that quota can apply. Specifying a resource outside of the allowed set results in a validation error.
Scope | Description |
---|---|
|
Match pods where |
|
Match pods where |
|
Match pods that have best effort quality of service for either |
|
Match pods that do not have best effort quality of service for |
A BestEffort
scope restricts a quota to limiting the following resources:
-
pods
A Terminating
, NotTerminating
, and NotBestEffort
scope restricts a quota to tracking the following resources:
-
pods
-
memory
-
requests.memory
-
limits.memory
-
cpu
-
requests.cpu
-
limits.cpu
-
ephemeral-storage
-
requests.ephemeral-storage
-
limits.ephemeral-storage
Ephemeral storage requests and limits apply only if you enabled the ephemeral storage technology preview. This feature is disabled by default.
Additional resources
See Resources managed by quotas for more on compute resources.
See Quality of Service Classes for more on committing compute resources.
5.2. Admin quota usage
5.2.1. Quota enforcement
After a resource quota for a project is first created, the project restricts the ability to create any new resources that can violate a quota constraint until it has calculated updated usage statistics.
After a quota is created and usage statistics are updated, the project accepts the creation of new content. When you create or modify resources, your quota usage is incremented immediately upon the request to create or modify the resource.
When you delete a resource, your quota use is decremented during the next full recalculation of quota statistics for the project.
A configurable amount of time determines how long it takes to reduce quota usage statistics to their current observed system value.
If project modifications exceed a quota usage limit, the server denies the action, and an appropriate error message is returned to the user explaining the quota constraint violated, and what their currently observed usage stats are in the system.
5.2.2. Requests compared to limits
When allocating compute resources by quota, each container can specify a request and a limit value each for CPU, memory, and ephemeral storage. Quotas can restrict any of these values.
If the quota has a value specified for requests.cpu
or requests.memory
, then it requires that every incoming container make an explicit request for those resources. If the quota has a value specified for limits.cpu
or limits.memory
, then it requires that every incoming container specify an explicit limit for those resources.
5.2.3. Sample resource quota definitions
Example core-object-counts.yaml
apiVersion: v1 kind: ResourceQuota metadata: name: core-object-counts spec: hard: configmaps: "10" 1 persistentvolumeclaims: "4" 2 replicationcontrollers: "20" 3 secrets: "10" 4 services: "10" 5
- 1
- The total number of
ConfigMap
objects that can exist in the project. - 2
- The total number of persistent volume claims (PVCs) that can exist in the project.
- 3
- The total number of replication controllers that can exist in the project.
- 4
- The total number of secrets that can exist in the project.
- 5
- The total number of services that can exist in the project.
Example openshift-object-counts.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: openshift-object-counts
spec:
hard:
openshift.io/imagestreams: "10" 1
- 1
- The total number of image streams that can exist in the project.
Example compute-resources.yaml
apiVersion: v1 kind: ResourceQuota metadata: name: compute-resources spec: hard: pods: "4" 1 requests.cpu: "1" 2 requests.memory: 1Gi 3 requests.ephemeral-storage: 2Gi 4 limits.cpu: "2" 5 limits.memory: 2Gi 6 limits.ephemeral-storage: 4Gi 7
- 1
- The total number of pods in a non-terminal state that can exist in the project.
- 2
- Across all pods in a non-terminal state, the sum of CPU requests cannot exceed 1 core.
- 3
- Across all pods in a non-terminal state, the sum of memory requests cannot exceed 1Gi.
- 4
- Across all pods in a non-terminal state, the sum of ephemeral storage requests cannot exceed 2Gi.
- 5
- Across all pods in a non-terminal state, the sum of CPU limits cannot exceed 2 cores.
- 6
- Across all pods in a non-terminal state, the sum of memory limits cannot exceed 2Gi.
- 7
- Across all pods in a non-terminal state, the sum of ephemeral storage limits cannot exceed 4Gi.
Example besteffort.yaml
apiVersion: v1 kind: ResourceQuota metadata: name: besteffort spec: hard: pods: "1" 1 scopes: - BestEffort 2
Example compute-resources-long-running.yaml
apiVersion: v1 kind: ResourceQuota metadata: name: compute-resources-long-running spec: hard: pods: "4" 1 limits.cpu: "4" 2 limits.memory: "2Gi" 3 limits.ephemeral-storage: "4Gi" 4 scopes: - NotTerminating 5
- 1
- The total number of pods in a non-terminal state.
- 2
- Across all pods in a non-terminal state, the sum of CPU limits cannot exceed this value.
- 3
- Across all pods in a non-terminal state, the sum of memory limits cannot exceed this value.
- 4
- Across all pods in a non-terminal state, the sum of ephemeral storage limits cannot exceed this value.
- 5
- Restricts the quota to only matching pods where
spec.activeDeadlineSeconds
is set tonil
. Build pods will fall underNotTerminating
unless theRestartNever
policy is applied.
Example compute-resources-time-bound.yaml
apiVersion: v1 kind: ResourceQuota metadata: name: compute-resources-time-bound spec: hard: pods: "2" 1 limits.cpu: "1" 2 limits.memory: "1Gi" 3 limits.ephemeral-storage: "1Gi" 4 scopes: - Terminating 5
- 1
- The total number of pods in a non-terminal state.
- 2
- Across all pods in a non-terminal state, the sum of CPU limits cannot exceed this value.
- 3
- Across all pods in a non-terminal state, the sum of memory limits cannot exceed this value.
- 4
- Across all pods in a non-terminal state, the sum of ephemeral storage limits cannot exceed this value.
- 5
- Restricts the quota to only matching pods where
spec.activeDeadlineSeconds >=0
. For example, this quota would charge for build pods, but not long running pods such as a web server or database.
Example storage-consumption.yaml
apiVersion: v1 kind: ResourceQuota metadata: name: storage-consumption spec: hard: persistentvolumeclaims: "10" 1 requests.storage: "50Gi" 2 gold.storageclass.storage.k8s.io/requests.storage: "10Gi" 3 silver.storageclass.storage.k8s.io/requests.storage: "20Gi" 4 silver.storageclass.storage.k8s.io/persistentvolumeclaims: "5" 5 bronze.storageclass.storage.k8s.io/requests.storage: "0" 6 bronze.storageclass.storage.k8s.io/persistentvolumeclaims: "0" 7
- 1
- The total number of persistent volume claims in a project
- 2
- Across all persistent volume claims in a project, the sum of storage requested cannot exceed this value.
- 3
- Across all persistent volume claims in a project, the sum of storage requested in the gold storage class cannot exceed this value.
- 4
- Across all persistent volume claims in a project, the sum of storage requested in the silver storage class cannot exceed this value.
- 5
- Across all persistent volume claims in a project, the total number of claims in the silver storage class cannot exceed this value.
- 6
- Across all persistent volume claims in a project, the sum of storage requested in the bronze storage class cannot exceed this value. When this is set to
0
, it means bronze storage class cannot request storage. - 7
- Across all persistent volume claims in a project, the sum of storage requested in the bronze storage class cannot exceed this value. When this is set to
0
, it means bronze storage class cannot create claims.
5.2.4. Creating a quota
To create a quota, first define the quota in a file. Then use that file to apply it to a project. See the Additional resources section for a link describing this.
$ oc create -f <resource_quota_definition> [-n <project_name>]
Here is an example using the core-object-counts.yaml
resource quota definition and the demoproject
project name:
$ oc create -f core-object-counts.yaml -n demoproject
5.2.5. Creating object count quotas
You can create an object count quota for all OpenShift Container Platform standard namespaced resource types, such as BuildConfig
, and DeploymentConfig
. An object quota count places a defined quota on all standard namespaced resource types.
When using a resource quota, an object is charged against the quota if it exists in server storage. These types of quotas are useful to protect against exhaustion of storage resources.
To configure an object count quota for a resource, run the following command:
$ oc create quota <name> --hard=count/<resource>.<group>=<quota>,count/<resource>.<group>=<quota>
Example showing object count quota:
$ oc create quota test --hard=count/deployments.extensions=2,count/replicasets.extensions=4,count/pods=3,count/secrets=4 resourcequota "test" created $ oc describe quota test Name: test Namespace: quota Resource Used Hard -------- ---- ---- count/deployments.extensions 0 2 count/pods 0 3 count/replicasets.extensions 0 4 count/secrets 0 4
This example limits the listed resources to the hard limit in each project in the cluster.
5.2.6. Viewing a quota
You can view usage statistics related to any hard limits defined in a project’s quota by navigating in the web console to the project’s Quota
page.
You can also use the CLI to view quota details:
First, get the list of quotas defined in the project. For example, for a project called
demoproject
:$ oc get quota -n demoproject NAME AGE besteffort 11m compute-resources 2m core-object-counts 29m
Describe the quota you are interested in, for example the
core-object-counts
quota:$ oc describe quota core-object-counts -n demoproject Name: core-object-counts Namespace: demoproject Resource Used Hard -------- ---- ---- configmaps 3 10 persistentvolumeclaims 0 4 replicationcontrollers 3 20 secrets 9 10 services 2 10
5.2.7. Configuring quota synchronization period
When a set of resources are deleted, the synchronization time frame of resources is determined by the resource-quota-sync-period
setting in the /etc/origin/master/master-config.yaml
file.
Before quota usage is restored, a user can encounter problems when attempting to reuse the resources. You can change the resource-quota-sync-period
setting to have the set of resources regenerate in the needed amount of time (in seconds) for the resources to be once again available:
Example resource-quota-sync-period
setting
kubernetesMasterConfig: apiLevels: - v1beta3 - v1 apiServerArguments: null controllerArguments: resource-quota-sync-period: - "10s"
After making any changes, restart the controller services to apply them.
$ master-restart api $ master-restart controllers
Adjusting the regeneration time can be helpful for creating resources and determining resource usage when automation is used.
The resource-quota-sync-period
setting balances system performance. Reducing the sync period can result in a heavy load on the controller.
5.2.8. Explicit quota to consume a resource
If a resource is not managed by quota, a user has no restriction on the amount of resource that can be consumed. For example, if there is no quota on storage related to the gold storage class, the amount of gold storage a project can create is unbounded.
For high-cost compute or storage resources, administrators can require an explicit quota be granted to consume a resource. For example, if a project was not explicitly given quota for storage related to the gold storage class, users of that project would not be able to create any storage of that type.
In order to require explicit quota to consume a particular resource, the following stanza should be added to the master-config.yaml.
admissionConfig: pluginConfig: ResourceQuota: configuration: apiVersion: resourcequota.admission.k8s.io/v1alpha1 kind: Configuration limitedResources: - resource: persistentvolumeclaims 1 matchContains: - gold.storageclass.storage.k8s.io/requests.storage 2
In the above example, the quota system intercepts every operation that creates or updates a PersistentVolumeClaim
. It checks what resources controlled by quota would be consumed. If there is no covering quota for those resources in the project, the request is denied. In this example, if a user creates a PersistentVolumeClaim
that uses storage associated with the gold storage class and there is no matching quota in the project, the request is denied.
Additional resources
For examples of how to create the file needed to set quotas, see Resources managed by quotas.
A description of how to allocate compute resources managed by quota.
For information on managing limits