Chapter 3. Telco reference designs
3.1. Telco core reference design specifications
The telco core reference design specifications (RDS) configures an OpenShift Container Platform cluster running on commodity hardware to host telco core workloads.
3.1.1. Telco core RDS 4.18 use model overview
The telco core reference design specifications (RDS) describes a platform that supports large-scale telco applications, including control plane functions such as signaling and aggregation. It also includes some centralized data plane functions, such as user plane functions (UPF). These functions generally require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge deployments such as RAN.
3.1.2. About the telco core cluster use model
The telco core cluster use model is designed for clusters that run on commodity hardware. Telco core clusters support large scale telco applications including control plane functions such as signaling, aggregation, and session border controller (SBC); and centralized data plane functions such as 5G user plane functions (UPF). Telco core cluster functions require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge RAN deployments.
Figure 3.1. Telco core RDS cluster service-based architecture and networking topology

Networking requirements for telco core functions vary widely across a range of networking features and performance points. IPv6 is a requirement and dual-stack is common. Some functions need maximum throughput and transaction rate and require support for user-plane DPDK networking. Other functions use more typical cloud-native patterns and can rely on OVN-Kubernetes, kernel networking, and load balancing.
Telco core clusters are configured as standard with three control plane and two or more worker nodes configured with the stock (non-RT) kernel. In support of workloads with varying networking and performance requirements, you can segment worker nodes by using MachineConfigPool
custom resources (CR), for example, for non-user data plane or high-throughput use cases. In support of required telco operational features, core clusters have a standard set of Day 2 OLM-managed Operators installed.
3.1.2.1. Reference design scope
The telco core and telco RAN reference design specifications (RDS) capture the recommended, tested, and supported configurations to get reliable and repeatable performance for clusters running the telco core and telco RAN profiles.
Each RDS includes the released features and supported configurations that are engineered and validated for clusters to run the individual profiles. The configurations provide a baseline OpenShift Container Platform installation that meets feature and KPI targets. Each RDS also describes expected variations for each individual configuration. Validation of each RDS includes many long duration and at-scale tests.
The validated reference configurations are updated for each major Y-stream release of OpenShift Container Platform. Z-stream patch releases are periodically re-tested against the reference configurations.
3.1.2.2. Deviations from the reference design
Deviating from the validated telco core and telco RAN DU reference design specifications (RDS) can have significant impact beyond the specific component or feature that you change. Deviations require analysis and engineering in the context of the complete solution.
All deviations from the RDS should be analyzed and documented with clear action tracking information. Due diligence is expected from partners to understand how to bring deviations into line with the reference design. This might require partners to provide additional resources to engage with Red Hat to work towards enabling their use case to achieve a best in class outcome with the platform. This is critical for the supportability of the solution and ensuring alignment across Red Hat and with partners.
Deviation from the RDS can have some or all of the following consequences:
- It can take longer to resolve issues.
- There is a risk of missing project service-level agreements (SLAs), project deadlines, end provider performance requirements, and so on.
Unapproved deviations may require escalation at executive levels.
NoteRed Hat prioritizes the servicing of requests for deviations based on partner engagement priorities.
3.1.3. Telco core common baseline model
The following configurations and use models are applicable to all telco core use cases. The telco core use cases build on this common baseline of features.
- Cluster topology
Telco core clusters conform to the following requirements:
- High availability control plane (three or more control plane nodes)
- Non-schedulable control plane nodes
- Multiple machine config pools
- Storage
- Telco core use cases require persistent storage as provided by Red Hat OpenShift Data Foundation.
- Networking
Telco core cluster networking conforms to the following requirements:
- Dual stack IPv4/IPv6 (IPv4 primary).
- Fully disconnected – clusters do not have access to public networking at any point in their lifecycle.
- Supports multiple networks. Segmented networking provides isolation between operations, administration and maintenance (OAM), signaling, and storage traffic.
- Cluster network type is OVN-Kubernetes as required for IPv6 support.
Telco core clusters have multiple layers of networking supported by underlying RHCOS, SR-IOV Network Operator, Load Balancer and other components. These layers include the following:
Cluster networking layer. The cluster network configuration is defined and applied through the installation configuration. Update the configuration during Day 2 operations with the NMState Operator. Use the initial configuration to establish the following:
- Host interface configuration.
- Active/active bonding (LACP).
-
Secondary/additional network layer. Configure the OpenShift Container Platform CNI through network
additionalNetwork
orNetworkAttachmentDefinition
CRs. Use the initial configuration to configure MACVLAN virtual network interfaces. - Application workload layer. User plane networking runs in cloud-native network functions (CNFs).
- Service Mesh
- Telco CNFs can use Service Mesh. All telco core clusters require a Service Mesh implementation. The choice of implementation and configuration is outside the scope of this specification.
3.1.4. Telco core cluster common use model engineering considerations
- Cluster workloads are detailed in "Application workloads".
Worker nodes should run on either of the following CPUs:
- Intel 3rd Generation Xeon (IceLake) CPUs or better when supported by OpenShift Container Platform, or CPUs with the silicon security bug (Spectre and similar) mitigations turned off. Skylake and older CPUs can experience 40% transaction performance drops when Spectre and similar mitigations are enabled.
AMD EPYC Zen 4 CPUs (Genoa, Bergamo, or newer) or better when supported by OpenShift Container Platform.
NoteCurrently, per-pod power management is not available for AMD CPUs.
-
IRQ balancing is enabled on worker nodes. The
PerformanceProfile
CR setsgloballyDisableIrqLoadBalancing
to false. Guaranteed QoS pods are annotated to ensure isolation as described in "CPU partitioning and performance tuning".
All cluster nodes should have the following features:
- Have Hyper-Threading enabled
- Have x86_64 CPU architecture
- Have the stock (non-realtime) kernel enabled
- Are not configured for workload partitioning
The balance between power management and maximum performance varies between machine config pools in the cluster. The following configurations should be consistent for all nodes in a machine config pools group.
- Cluster scaling. See "Scalability" for more information.
- Clusters should be able to scale to at least 120 nodes.
-
CPU partitioning is configured using a
PerformanceProfile
CR and is applied to nodes on a perMachineConfigPool
basis. See "CPU partitioning and performance tuning" for additional considerations. CPU requirements for OpenShift Container Platform depend on the configured feature set and application workload characteristics. For a cluster configured according to the reference configuration running a simulated workload of 3000 pods as created by the kube-burner node-density test, the following CPU requirements are validated:
- The minimum number of reserved CPUs for control plane and worker nodes is 2 CPUs (4 hyper-threads) per NUMA node.
- The NICs used for non-DPDK network traffic should be configured to use at least 16 RX/TX queues.
- Nodes with large numbers of pods or other resources might require additional reserved CPUs. The remaining CPUs are available for user workloads.
NoteVariations in OpenShift Container Platform configuration, workload size, and workload characteristics require additional analysis to determine the effect on the number of required CPUs for the OpenShift platform.
3.1.4.1. Application workloads
Application workloads running on telco core clusters can include a mix of high performance cloud-native network functions (CNFs) and traditional best-effort or burstable pod workloads.
Guaranteed QoS scheduling is available to pods that require exclusive or dedicated use of CPUs due to performance or security requirements. Typically, pods that run high performance or latency sensitive CNFs by using user plane networking (for example, DPDK) require exclusive use of dedicated whole CPUs achieved through node tuning and guaranteed QoS scheduling. When creating pod configurations that require exclusive CPUs, be aware of the potential implications of hyper-threaded systems. Pods should request multiples of 2 CPUs when the entire core (2 hyper-threads) must be allocated to the pod.
Pods running network functions that do not require high throughput or low latency networking should be scheduled with best-effort or burstable QoS pods and do not require dedicated or isolated CPU cores.
- Engineering considerations
Use the following information to plan telco core workloads and cluster resources:
- CNF applications should conform to the latest version of Red Hat Best Practices for Kubernetes.
Use a mix of best-effort and burstable QoS pods as required by your applications.
-
Use guaranteed QoS pods with proper configuration of reserved or isolated CPUs in the
PerformanceProfile
CR that configures the node. - Guaranteed QoS Pods must include annotations for fully isolating CPUs.
- Best effort and burstable pods are not guaranteed exclusive CPU use. Workloads can be preempted by other workloads, operating system daemons, or kernel tasks.
-
Use guaranteed QoS pods with proper configuration of reserved or isolated CPUs in the
Use exec probes sparingly and only when no other suitable option is available.
-
Do not use exec probes if a CNF uses CPU pinning. Use other probe implementations, for example,
httpGet
ortcpSocket
. - When you need to use exec probes, limit the exec probe frequency and quantity. The maximum number of exec probes must be kept below 10, and the frequency must not be set to less than 10 seconds.
- You can use startup probes, because they do not use significant resources at steady-state operation. This limitation on exec probes applies primarily to liveness and readiness probes. Exec probes cause much higher CPU usage on management cores compared to other probe types because they require process forking.
-
Do not use exec probes if a CNF uses CPU pinning. Use other probe implementations, for example,
3.1.4.2. Signaling workloads
Signaling workloads typically use SCTP, REST, gRPC or similar TCP or UDP protocols. Signaling workloads support hundreds of thousands of transactions per second (TPS) by using a secondary multus CNI configured as MACVLAN or SR-IOV interface. These workloads can run in pods with either guaranteed or burstable QoS.
3.1.5. Telco core RDS components
The following sections describe the various OpenShift Container Platform components and configurations that you use to configure and deploy clusters to run telco core workloads.
3.1.5.1. CPU partitioning and performance tuning
- New in this release
- No reference design updates in this release
- Description
- CPU partitioning improves performance and reduces latency by separating sensitive workloads from general-purpose tasks, interrupts, and driver work queues. The CPUs allocated to those auxiliary processes are referred to as reserved in the following sections. In a system with Hyper-Threading enabled, a CPU is one hyper-thread.
- Limits and requirements
The operating system needs a certain amount of CPU to perform all the support tasks, including kernel networking.
- A system with just user plane networking applications (DPDK) needs at least one core (2 hyper-threads when enabled) reserved for the operating system and the infrastructure components.
- In a system with Hyper-Threading enabled, core sibling threads must always be in the same pool of CPUs.
- The set of reserved and isolated cores must include all CPU cores.
- Core 0 of each NUMA node must be included in the reserved CPU set.
- Low latency workloads require special configuration to avoid being affected by interrupts, kernel scheduler, or other parts of the platform. For more information, see "Creating a performance profile".
- Engineering considerations
-
The minimum reserved capacity (
systemReserved
) required can be found by following the guidance in the Which amount of CPU and memory are recommended to reserve for the system in OpenShift 4 nodes? Knowledgebase article. - The actual required reserved CPU capacity depends on the cluster configuration and workload attributes.
- The reserved CPU value must be rounded up to a full core (2 hyper-threads) alignment.
- Changes to CPU partitioning cause the nodes contained in the relevant machine config pool to be drained and rebooted.
- The reserved CPUs reduce the pod density, because the reserved CPUs are removed from the allocatable capacity of the OpenShift Container Platform node.
The real-time workload hint should be enabled for real-time capable workloads.
-
Applying the real time
workloadHint
setting results in thenohz_full
kernel command line parameter being applied to improve performance of high performance applications. When you apply theworkloadHint
setting, any isolated or burstable pods that do not have thecpu-quota.crio.io: "disable"
annotation and a properruntimeClassName
value, are subject to CRI-O rate limiting. When you set theworkloadHint
parameter, be aware of the tradeoff between increased performance and the potential impact of CRI-O rate limiting. Ensure that required pods are correctly annotated.
-
Applying the real time
- Hardware without IRQ affinity support affects isolated CPUs. All server hardware must support IRQ affinity to ensure that pods with guaranteed CPU QoS can fully use allocated CPUs.
-
OVS dynamically manages its
cpuset
entry to adapt to network traffic needs. You do not need to reserve an additional CPU for handling high network throughput on the primary CNI. If workloads running on the cluster use kernel level networking, the RX/TX queue count for the participating NICs should be set to 16 or 32 queues if the hardware permits it. Be aware of the default queue count. With no configuration, the default queue count is one RX/TX queue per online CPU; which can result in too many interrupts being allocated.
NoteSome drivers do not deallocate the interrupts even after reducing the queue count.
If workloads running on the cluster require cgroup v1, you can configure nodes to use cgroup v1 as part of the initial cluster deployment. See "Enabling Linux control group version 1 (cgroup v1)" and Red Hat Enterprise Linux 9 changes in the context of Red Hat OpenShift workloads.
NoteSupport for cgroup v1 is planned for removal in OpenShift Container Platform 4.19. Clusters running cgroup v1 must transition to cgroup v2.
-
The minimum reserved capacity (
3.1.5.2. Service mesh
- Description
- Telco core cloud-native functions (CNFs) typically require a service mesh implementation. Specific service mesh features and performance requirements are dependent on the application. The selection of service mesh implementation and configuration is outside the scope of this documentation. You must account for the impact of service mesh on cluster resource usage and performance, including additional latency introduced in pod networking, in your implementation.
Additional resources
3.1.5.3. Networking
The following diagram describes the telco core reference design networking configuration.
Figure 3.2. Telco core reference design networking configuration

- New in this release
- Support for disabling vendor plugins in the SR-IOV Operator
- Extended telco core RDS validation with MetalLB and EgressIP telco QE validation
FRR-K8s is now available under the Cluster Network Operator.
NoteIf you have custom
FRRConfiguration
CRs in themetallb-system
namespace, you must move them under theopenshift-network-operator
namespace.
- Description
- The cluster is configured for dual-stack IP (IPv4 and IPv6).
- The validated physical network configuration consists of two dual-port NICs. One NIC is shared among the primary CNI (OVN-Kubernetes) and IPVLAN and MACVLAN traffic, while the second one is dedicated to SR-IOV VF-based pod traffic.
-
A Linux bonding interface (
bond0
) is created in active-active IEEE 802.3ad LACP mode with the two NIC ports attached. The top-of-rack networking equipment must support and be configured for multi-chassis link aggregation (mLAG) technology. -
VLAN interfaces are created on top of
bond0
, including for the primary CNI. -
Bond and VLAN interfaces are created at cluster install time during the network configuration stage of the installation. Except for the
vlan0
VLAN used by the primary CNI, all other VLANs can be created during Day 2 activities with the Kubernetes NMstate Operator. - MACVLAN and IPVLAN interfaces are created with their corresponding CNIs. They do not share the same base interface. For more information, see "Cluster Network Operator".
- SR-IOV VFs are managed by the SR-IOV Network Operator.
-
To ensure consistent source IP addresses for pods behind a LoadBalancer Service, configure an
EgressIP
CR and specify thepodSelector
parameter. You can implement service traffic separation by doing the following:
-
Configure VLAN interfaces and specific kernel IP routes on the nodes using
NodeNetworkConfigurationPolicy
CRs. -
Create a MetalLB
BGPPeer
CR for each VLAN to establish peering with the remote BGP router. Define a MetalLB
BGPAdvertisement
CR to specify which IP address pools should be advertised to a selected list ofBGPPeer
resources.The following diagram illustrates how specific service IP addresses are advertised to the outside via specific VLAN interfaces. Services routes are defined in
BGPAdvertisement
CRs and configured with values forIPAddressPool1
andBGPPeer1
fields.
-
Configure VLAN interfaces and specific kernel IP routes on the nodes using
Figure 3.3. Telco core reference design MetalLB service separation

Additional resources
3.1.5.3.1. Cluster Network Operator
- New in this release
- No reference design updates in this release
- Description
The Cluster Network Operator (CNO) deploys and manages the cluster network components including the default OVN-Kubernetes network plugin during cluster installation. The CNO allows for configuring primary interface MTU settings, OVN gateway modes to use node routing tables for pod egress, and additional secondary networks such as MACVLAN.
In support of network traffic separation, multiple network interfaces are configured through the CNO. Traffic steering to these interfaces is configured through static routes applied by using the NMState Operator. To ensure that pod traffic is properly routed, OVN-K is configured with the
routingViaHost
option enabled. This setting uses the kernel routing table and the applied static routes rather than OVN for pod egress traffic.The Whereabouts CNI plugin is used to provide dynamic IPv4 and IPv6 addressing for additional pod network interfaces without the use of a DHCP server.
- Limits and requirements
- OVN-Kubernetes is required for IPv6 support.
- Large MTU cluster support requires connected network equipment to be set to the same or larger value. MTU size up to 8900 is supported.
MACVLAN and IPVLAN cannot co-locate on the same main interface due to their reliance on the same underlying kernel mechanism, specifically the
rx_handler
. This handler allows a third-party module to process incoming packets before the host processes them, and only one such handler can be registered per network interface. Since both MACVLAN and IPVLAN need to register their ownrx_handler
to function, they conflict and cannot coexist on the same interface. Review the source code for more details:- Alternative NIC configurations include splitting the shared NIC into multiple NICs or using a single dual-port NIC, though they have not been tested and validated.
- Clusters with single-stack IP configuration are not validated.
-
The
reachabilityTotalTimeoutSeconds
parameter in theNetwork
CR configures theEgressIP
node reachability check total timeout in seconds. The recommended value is1
second.
- Engineering considerations
-
Pod egress traffic is handled by kernel routing table using the
routingViaHost
option. Appropriate static routes must be configured in the host.
-
Pod egress traffic is handled by kernel routing table using the
Additional resources
3.1.5.3.2. Load balancer
- New in this release
FRR-K8s is now available under the Cluster Network Operator.
ImportantIf you have custom
FRRConfiguration
CRs in themetallb-system
namespace, you must move them under theopenshift-network-operator
namespace.
- Description
- MetalLB is a load-balancer implementation for bare metal Kubernetes clusters that uses standard routing protocols. It enables a Kubernetes service to get an external IP address which is also added to the host network for the cluster. The MetalLB Operator deploys and manages the lifecycle of a MetalLB instance in a cluster. Some use cases might require features not available in MetalLB, such as stateful load balancing. Where necessary, you can use an external third party load balancer. Selection and configuration of an external load balancer is outside the scope of this specification. When an external third-party load balancer is used, the integration effort must include enough analysis to ensure all performance and resource utilization requirements are met.
- Limits and requirements
- Stateful load balancing is not supported by MetalLB. An alternate load balancer implementation must be used if this is a requirement for workload CNFs.
- You must ensure that the external IP address is routable from clients to the host network for the cluster.
- Engineering considerations
- MetalLB is used in BGP mode only for telco core use models.
-
For telco core use models, MetalLB is supported only with the OVN-Kubernetes network provider used in local gateway mode. See
routingViaHost
in "Cluster Network Operator". BGP configuration in MetalLB is expected to vary depending on the requirements of the network and peers.
- You can configure address pools with variations in addresses, aggregation length, auto assignment, and so on.
-
MetalLB uses BGP for announcing routes only. Only the
transmitInterval
andminimumTtl
parameters are relevant in this mode. Other parameters in the BFD profile should remain close to the defaults as shorter values can lead to false negatives and affect performance.
Additional resources
3.1.5.3.3. SR-IOV
- New in this release
- You can now create virtual functions for Mellanox NICs with the SR-IOV Network Operator when secure boot is enabled in the cluster host. Before you can create the virtual functions, you must first skip the firmware configuration for the Mellanox NIC and manually allocate the number of virtual functions in the firmware before switching the system to secure boot.
- Description
- SR-IOV enables physical functions (PFs) to be divided into multiple virtual functions (VFs). VFs can then be assigned to multiple pods to achieve higher throughput performance while keeping the pods isolated. The SR-IOV Network Operator provisions and manages SR-IOV CNI, network device plugin, and other components of the SR-IOV stack.
- Limits and requirements
- Only certain network interfaces are supported. See "Supported devices" for more information.
- Enabling SR-IOV and IOMMU: the SR-IOV Network Operator automatically enables IOMMU on the kernel command line.
- SR-IOV VFs do not receive link state updates from the PF. If a link down detection is required, it must be done at the protocol level.
-
MultiNetworkPolicy
CRs can be applied tonetdevice
networks only. This is because the implementation uses iptables, which cannot manage vfio interfaces.
- Engineering considerations
-
SR-IOV interfaces in
vfio
mode are typically used to enable additional secondary networks for applications that require high throughput or low latency. -
The
SriovOperatorConfig
CR must be explicitly created. This CR is included in the reference configuration policies, which causes it to be created during initial deployment. - NICs that do not support firmware updates with UEFI secure boot or kernel lockdown must be preconfigured with sufficient virtual functions (VFs) enabled to support the number of VFs required by the application workload. For Mellanox NICs, you must disable the Mellanox vendor plugin in the SR-IOV Network Operator. See "Configuring an SR-IOV network device" for more information.
-
To change the MTU value of a VF after the pod has started, do not configure the
SriovNetworkNodePolicy
MTU field. Instead, use the Kubernetes NMState Operator to set the MTU of the related PF.
-
SR-IOV interfaces in
3.1.5.3.4. NMState Operator
- New in this release
- No reference design updates in this release
- Description
- The Kubernetes NMState Operator provides a Kubernetes API for performing state-driven network configuration across cluster nodes. It enables network interface configurations, static IPs and DNS, VLANs, trunks, bonding, static routes, MTU, and enabling promiscuous mode on the secondary interfaces. The cluster nodes periodically report on the state of each node’s network interfaces to the API server.
- Limits and requirements
- Not applicable
- Engineering considerations
-
Initial networking configuration is applied using
NMStateConfig
content in the installation CRs. The NMState Operator is used only when required for network updates. -
When SR-IOV virtual functions are used for host networking, the NMState Operator (via
nodeNetworkConfigurationPolicy
CRs) is used to configure VF interfaces, such as VLANs and MTU.
-
Initial networking configuration is applied using
Additional resources
3.1.5.4. Logging
- New in this release
- No reference design updates in this release
- Description
- The Cluster Logging Operator enables collection and shipping of logs off the node for remote archival and analysis. The reference configuration uses Kafka to ship audit and infrastructure logs to a remote archive.
- Limits and requirements
- Not applicable
- Engineering considerations
- The impact of cluster CPU use is based on the number or size of logs generated and the amount of log filtering configured.
- The reference configuration does not include shipping of application logs. The inclusion of application logs in the configuration requires you to evaluate the application logging rate and have sufficient additional CPU resources allocated to the reserved set.
Additional resources
3.1.5.5. Power Management
- New in this release
- No reference design updates in this release
- Description
- Use the Performance Profile to configure clusters with high power mode, low power mode, or mixed mode. The choice of power mode depends on the characteristics of the workloads running on the cluster, particularly how sensitive they are to latency. Configure the maximum latency for a low-latency pod by using the per-pod power management C-states feature.
- Limits and requirements
- Power configuration relies on appropriate BIOS configuration, for example, enabling C-states and P-states. Configuration varies between hardware vendors.
- Engineering considerations
- Latency: To ensure that latency-sensitive workloads meet requirements, you require a high-power or a per-pod power management configuration. Per-pod power management is only available for Guaranteed QoS pods with dedicated pinned CPUs.
3.1.5.6. Storage
- New in this release
- No reference design updates in this release
- Description
Cloud native storage services can be provided by Red Hat OpenShift Data Foundation or other third-party solutions.
OpenShift Data Foundation is a Ceph-based software-defined storage solution for containers. It provides block storage, file system storage, and on-premise object storage, which can be dynamically provisioned for both persistent and non-persistent data requirements. Telco core applications require persistent storage.
NoteAll storage data might not be encrypted in flight. To reduce risk, isolate the storage network from other cluster networks. The storage network must not be reachable, or routable, from other cluster networks. Only nodes directly attached to the storage network should be allowed to gain access to it.
Additional resources
3.1.5.6.1. Red Hat OpenShift Data Foundation
- New in this release
- No reference design updates in this release
- Description
- Red Hat OpenShift Data Foundation is a software-defined storage service for containers. For telco core clusters, storage support is provided by OpenShift Data Foundation storage services running externally to the application workload cluster. OpenShift Data Foundation supports separation of storage traffic using secondary CNI networks.
- Limits and requirements
- In an IPv4/IPv6 dual-stack networking environment, OpenShift Data Foundation uses IPv4 addressing. For more information, see Network requirements.
- Engineering considerations
- OpenShift Data Foundation network traffic should be isolated from other traffic on a dedicated network, for example, by using VLAN isolation.
3.1.5.6.2. Additional storage solutions
You can use other storage solutions to provide persistent storage for telco core clusters. The configuration and integration of these solutions is outside the scope of the reference design specifications (RDS).
Integration of the storage solution into the telco core cluster must include proper sizing and performance analysis to ensure the storage meets overall performance and resource usage requirements.
3.1.5.7. Telco core deployment components
The following sections describe the various OpenShift Container Platform components and configurations that you use to configure the hub cluster with Red Hat Advanced Cluster Management (RHACM).
3.1.5.7.1. Red Hat Advanced Cluster Management
- New in this release
- No reference design updates in this release
- Description
Red Hat Advanced Cluster Management (RHACM) provides Multi Cluster Engine (MCE) installation and ongoing GitOps ZTP lifecycle management for deployed clusters. You manage cluster configuration and upgrades declaratively by applying
Policy
custom resources (CRs) to clusters during maintenance windows.You apply policies with the RHACM policy controller as managed by Topology Aware Lifecycle Manager. Configuration, upgrades, and cluster status are managed through the policy controller.
When installing managed clusters, RHACM applies labels and initial ignition configuration to individual nodes in support of custom disk partitioning, allocation of roles, and allocation to machine config pools. You define these configurations with
SiteConfig
orClusterInstance
CRs.- Limits and requirements
- Hub cluster sizing is discussed in Sizing your cluster.
- RHACM scaling limits are described in Performance and Scalability.
- Engineering considerations
- When managing multiple clusters with unique content per installation, site, or deployment, using RHACM hub templating is strongly recommended. RHACM hub templating allows you to apply a consistent set of policies to clusters while providing for unique values per installation.
3.1.5.7.2. Topology Aware Lifecycle Manager
- New in this release
- No reference design updates in this release.
- Description
Topology Aware Lifecycle Manager is an Operator which runs only on the hub cluster. TALM manages how changes including cluster and Operator upgrades, configurations, and so on, are rolled out to managed clusters in the network. TALM has the following core features:
- Provides sequenced updates of cluster configurations and upgrades (OpenShift Container Platform and Operators) as defined by cluster policies.
- Provides for deferred application of cluster updates.
- Supports progressive rollout of policy updates to sets of clusters in user configurable batches.
-
Allows for per-cluster actions by adding
ztp-done
or similar user-defined labels to clusters.
- Limits and requirements
- Supports concurrent cluster deployments in batches of 400.
- Engineering considerations
-
Only policies with the
ran.openshift.io/ztp-deploy-wave
annotation are applied by TALM during initial cluster installation. -
Any policy can be remediated by TALM under control of a user created
ClusterGroupUpgrade
CR.
-
Only policies with the
Additional resources
3.1.5.7.3. GitOps Operator and GitOps ZTP plugins
- New in this release
- No reference design updates in this release
- Description
The GitOps Operator provides a GitOps driven infrastructure for managing cluster deployment and configuration. Cluster definitions and configuration are maintained in a Git repository.
ZTP plugins provide support for generating
Installation
CRs fromSiteConfig
CRs and automatically wrapping configuration CRs in policies based on RHACMPolicyGenerator
CRs.The SiteConfig Operator provides improved support for generation of
Installation
CRs fromClusterInstance
CRs.ImportantWhere possible, use
ClusterInstance
CRs for cluster installation instead of theSiteConfig
with GitOps ZTP plugin method.You should structure the Git repository according to release version, with all necessary artifacts (
SiteConfig
,ClusterInstance
,PolicyGenerator
, andPolicyGenTemplate
, and supporting reference CRs) included. This enables deploying and managing multiple versions of the OpenShift platform and configuration versions to clusters simultaneously and through upgrades.The recommended Git structure keeps reference CRs in a directory separate from customer or partner provided content. This means that you can import reference updates by simply overwriting existing content. Customer or partner-supplied CRs can be provided in a parallel directory to the reference CRs for easy inclusion in the generated configuration policies.
- Limits and requirements
- Each ArgoCD application supports up to 300 nodes. Multiple ArgoCD applications can be used to achieve the maximum number of clusters supported by a single hub cluster.
The
SiteConfig
CR must use theextraManifests.searchPaths
field to reference the reference manifests.NoteSince OpenShift Container Platform 4.15, the
spec.extraManifestPath
field is deprecated.
- Engineering considerations
Set the
MachineConfigPool
(mcp
) CRpaused
field to true during a cluster upgrade maintenance window and set themaxUnavailable
field to the maximum tolerable value. This prevents multiple cluster node reboots during upgrade, which results in a shorter overall upgrade. When you unpause themcp
CR, all the configuration changes are applied with a single reboot.NoteDuring installation, custom
mcp
CRs can be paused along with settingmaxUnavailable
to 100% to improve installation times.-
To avoid confusion or unintentional overwriting when updating content, you should use unique and distinguishable names for custom CRs in the
reference-crs/
directory under core-overlay and extra manifests in Git. -
The
SiteConfig
CR allows multiple extra-manifest paths. When file names overlap in multiple directory paths, the last file found in the directory order list takes precedence.
3.1.5.7.4. Monitoring
- New in this release
- No reference design updates in this release
- Description
The Cluster Monitoring Operator (CMO) is included by default in OpenShift Container Platform and provides monitoring (metrics, dashboards, and alerting) for the platform components and optionally user projects. You can customize the default log retention period, custom alert rules, and so on. The default handling of pod CPU and memory metrics, based on upstream Kubernetes and cAdvisor, makes a tradeoff favoring stale data over metric accuracy. This leads to spikes in reporting, which can create false alerts, depending on the user-specified thresholds. OpenShift Container Platform supports an opt-in Dedicated Service Monitor feature that creates an additional set of pod CPU and memory metrics that do not suffer from this behavior. For more information, see Dedicated Service Monitors - Questions and Answers (Red Hat Knowledgebase).
In addition to the default configuration, the following metrics are expected to be configured for telco core clusters:
- Pod CPU and memory metrics and alerts for user workloads
- Limits and requirements
- You must enable the Dedicated Service Monitor feature to represent pod metrics accurately.
- Engineering considerations
- The Prometheus retention period is specified by the user. The value used is a tradeoff between operational requirements for maintaining historical data on the cluster against CPU and storage resources. Longer retention periods increase the need for storage and require additional CPU to manage data indexing.
Additional resources
3.1.5.8. Scheduling
- New in this release
- No reference design updates in this release
- Description
The scheduler is a cluster-wide component responsible for selecting the correct node for a given workload. It is a core part of the platform and does not require any specific configuration in the common deployment scenarios. However, a few specific use cases are described in the following section.
NUMA-aware scheduling can be enabled through the NUMA Resources Operator. For more information, see "Scheduling NUMA-aware workloads".
- Limits and requirements
The default scheduler does not understand the NUMA locality of workloads. It only knows about the sum of all free resources on a worker node. This might cause workloads to be rejected when scheduled to a node with the topology manager policy set to
single-numa-node
orrestricted
. For more information, see "Topology Manager policies".- For example, consider a pod requesting 6 CPUs that is scheduled to an empty node that has 4 CPUs per NUMA node. The total allocatable capacity of the node is 8 CPUs. The scheduler places the pod on the empty node. The node local admission fails, as there are only 4 CPUs available in each of the NUMA nodes.
-
All clusters with multi-NUMA nodes are required to use the NUMA Resources Operator. See "Installing the NUMA Resources Operator" for more information. Use the
machineConfigPoolSelector
field in theKubeletConfig
CR to select all nodes where NUMA aligned scheduling is required. - All machine config pools must have consistent hardware configuration. For example, all nodes are expected to have the same NUMA zone count.
- Engineering considerations
- Pods might require annotations for correct scheduling and isolation. For more information about annotations, see "CPU partitioning and performance tuning".
-
You can configure SR-IOV virtual function NUMA affinity to be ignored during scheduling by using the
excludeTopology
field inSriovNetworkNodePolicy
CR.
Additional resources
3.1.5.9. Node Configuration
- New in this release
- No reference design updates in this release
- Limits and requirements
Analyze additional kernel modules to determine impact on CPU load, system performance, and ability to meet KPIs.
Table 3.1. Additional kernel modules Feature Description Additional kernel modules
Install the following kernel modules by using
MachineConfig
CRs to provide extended kernel functionality to CNFs.- sctp
- ip_gre
- ip6_tables
- ip6t_REJECT
- ip6table_filter
- ip6table_mangle
- iptable_filter
- iptable_mangle
- iptable_nat
- xt_multiport
- xt_owner
- xt_REDIRECT
- xt_statistic
- xt_TCPMSS
Container mount namespace hiding
Reduce the frequency of kubelet housekeeping and eviction monitoring to reduce CPU usage. Creates a container mount namespace, visible to kubelet/CRI-O, to reduce system mount scanning overhead.
Kdump enable
Optional configuration (enabled by default)
3.1.5.10. Host firmware and boot loader configuration
- New in this release
- No reference design updates in this release
- Engineering considerations
Enabling secure boot is the recommended configuration.
NoteWhen secure boot is enabled, only signed kernel modules are loaded by the kernel. Out-of-tree drivers are not supported.
3.1.5.11. Disconnected environment
- New in this release
- No reference design updates in this release
- Descrption
Telco core clusters are expected to be installed in networks without direct access to the internet. All container images needed to install, configure, and operate the cluster must be available in a disconnected registry. This includes OpenShift Container Platform images, Day 2 OLM Operator images, and application workload images. The use of a disconnected environment provides multiple benefits, including:
- Security - limiting access to the cluster
- Curated content – the registry is populated based on curated and approved updates for clusters
- Limits and requirements
-
A unique name is required for all custom
CatalogSource
resources. Do not reuse the default catalog names.
-
A unique name is required for all custom
- Engineering considerations
- A valid time source must be configured as part of cluster installation
Additional resources
3.1.5.12. Agent-based Installer
- New in this release
- No reference design updates in this release
- Description
Telco core clusters can be installed by using the Agent-based Installer. This method allows you to install OpenShift on bare-metal servers without requiring additional servers or VMs for managing the installation. The Agent-based Installer can be run on any system (for example, from a laptop) to generate an ISO installation image. The ISO is used as the installation media for the cluster supervisor nodes. Installation progress can be monitored using the ABI tool from any system with network connectivity to the supervisor node’s API interfaces.
ABI supports the following:
- Installation from declarative CRs
- Installation in disconnected environments
- Installation with no additional supporting install or bastion servers required to complete the installation
- Limits and requirements
- Disconnected installation requires a registry that is reachable from the installed host, with all required content mirrored in that registry.
- Engineering considerations
- Networking configuration should be applied as NMState configuration during installation. Day 2 networking configuration using the NMState Operator is not supported.
Additional resources
3.1.5.13. Security
- New in this release
- Description
Telco customers are security conscious and require clusters to be hardened against multiple attack vectors. In OpenShift Container Platform, there is no single component or feature responsible for securing a cluster. Use the following security-oriented features and configurations to secure your clusters:
-
SecurityContextConstraints (SCC): All workload pods should be run with
restricted-v2
orrestricted
SCC. -
Seccomp: All pods should run with the
RuntimeDefault
(or stronger) seccomp profile. - Rootless DPDK pods: Many user-plane networking (DPDK) CNFs require pods to run with root privileges. With this feature, a conformant DPDK pod can run without requiring root privileges. Rootless DPDK pods create a tap device in a rootless pod that injects traffic from a DPDK application to the kernel.
- Storage: The storage network should be isolated and non-routable to other cluster networks. See the "Storage" section for additional details.
Refer to Custom nftable firewall rules in OpenShift for a supported method of implementing custom nftables firewall rules in OpenShift cluster nodes. This article is intended for cluster administrators who are responsible for managing network security policies in OpenShift environments. It is crucial to carefully consider the operational implications before deploying this method, including:
- Early application: The rules are applied at boot time, before the network is fully operational. Ensure the rules don’t inadvertently block essential services required during the boot process.
- Risk of misconfiguration: Errors in your custom rules can lead to unintended consequences, potentially leading to performance impact or blocking legitimate traffic or isolating nodes. Thoroughly test your rules in a non-production environment before deploying them to your main cluster.
- External endpoints: OpenShift requires access to external endpoints to function. For more information about the firewall allowlist, see "Configuring your firewall for OpenShift Container Platform". Ensure that cluster nodes are permitted access to those endpoints.
Node reboot: Unless node disruption policies are configured, applying the
MachineConfig
CR with the required firewall settings causes a node reboot. Be aware of this impact and schedule a maintenance window accordingly. For more information, see "Using node disruption policies to minimize disruption from machine config changes".NoteNode disruption policies are available in OpenShift Container Platform 4.17 and later.
- Network flow matrix: For more information about managing ingress traffic, see "OpenShift Container Platform network flow matrix". You can restrict ingress traffic to essential flows to improve network security. The matrix provides insights into base cluster services but excludes traffic generated by Day-2 Operators.
- Cluster version updates and upgrades: Exercise caution when updating or upgrading OpenShift clusters. Recent changes to the platform’s firewall requirements might require adjustments to network port permissions. Although the documentation provides guidelines, note that these requirements can evolve over time. To minimize disruptions, you should test any updates or upgrades in a staging environment before applying them in production. This helps you to identify and address potential compatibility issues related to firewall configuration changes.
-
SecurityContextConstraints (SCC): All workload pods should be run with
- Limits and requirements
Rootless DPDK pods requires the following additional configuration:
-
Configure the
container_t
SELinux context for the tap plugin. -
Enable the
container_use_devices
SELinux boolean for the cluster host.
-
Configure the
- Engineering considerations
-
For rootless DPDK pod support, enable the SELinux
container_use_devices
boolean on the host to allow the tap device to be created. This introduces an acceptable security risk.
-
For rootless DPDK pod support, enable the SELinux
3.1.5.14. Scalability
- New in this release
- No reference design updates in this release
- Description
- Scaling of workloads is described in "Application workloads".
- Limits and requirements
- Cluster can scale to at least 120 nodes.
3.1.6. Telco core reference configuration CRs
Use the following custom resources (CRs) to configure and deploy OpenShift Container Platform clusters with the telco core profile. Use the CRs to form the common baseline used in all the specific use models unless otherwise indicated.
3.1.6.1. Extracting the telco core reference design configuration CRs
You can extract the complete set of custom resources (CRs) for the telco core profile from the telco-core-rds-rhel9
container image. The container image has both the required CRs, and the optional CRs, for the telco core profile.
Prerequisites
-
You have installed
podman
.
Procedure
Extract the content from the
telco-core-rds-rhel9
container image by running the following commands:$ mkdir -p ./out
$ podman run -it registry.redhat.io/openshift4/openshift-telco-core-rds-rhel9:v4.18 | base64 -d | tar xv -C out
Verification
The
out
directory has the following directory structure. You can view the telco core CRs in theout/telco-core-rds/
directory.Example output
out/ └── telco-core-rds ├── configuration │ └── reference-crs │ ├── optional │ │ ├── logging │ │ ├── networking │ │ │ └── multus │ │ │ └── tap_cni │ │ ├── other │ │ └── tuning │ └── required │ ├── networking │ │ ├── metallb │ │ ├── multinetworkpolicy │ │ └── sriov │ ├── other │ ├── performance │ ├── scheduling │ └── storage │ └── odf-external └── install
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. -
You have credentials to access the
registry.redhat.io
container image registry. -
You installed the
cluster-compare
plugin.
Procedure
Login to the container image registry with your credentials by running the following command:
$ podman login registry.redhat.io
Additional resources
3.1.6.2. Node configuration reference CRs
Component | Reference CR | Description | Optional |
---|---|---|---|
Additional kernel modules |
| Optional. Configures the kernel modules for control plane nodes. | No |
Additional kernel modules |
| Optional. Loads the SCTP kernel module in worker nodes. | No |
Additional kernel modules |
| Optional. Configures kernel modules for worker nodes. | No |
Container mount namespace hiding |
| Configures a mount namespace for sharing container-specific mounts between kubelet and CRI-O on control plane nodes. | No |
Container mount namespace hiding |
| Configures a mount namespace for sharing container-specific mounts between kubelet and CRI-O on worker nodes. | No |
Kdump enable |
| Configures kdump crash reporting on master nodes. | No |
Kdump enable |
| Configures kdump crash reporting on worker nodes. | No |
3.1.6.3. Resource tuning reference CRs
Component | Reference CR | Description | Optional |
---|---|---|---|
System reserved capacity |
| Optional. Configures kubelet, enabling auto-sizing reserved resources for the control plane node pool. | No |
3.1.6.4. Networking reference CRs
Component | Reference CR | Description | Optional |
---|---|---|---|
Baseline |
| Configures the default cluster network, specifying OVN Kubernetes settings like routing via the host. It also allows the definition of additional networks, including custom CNI configurations, and enables the use of MultiNetworkPolicy CRs for network policies across multiple networks. | No |
Baseline |
| Optional. Defines a NetworkAttachmentDefinition resource specifying network configuration details such as node selector and CNI configuration. | Yes |
Load Balancer |
| Configures MetalLB to manage a pool of IP addresses with auto-assign enabled for dynamic allocation of IPs from the specified range. | No |
Load Balancer |
| Configures bidirectional forwarding detection (BFD) with customized intervals, detection multiplier, and modes for quicker network fault detection and load balancing failover. | No |
Load Balancer |
| Defines a BGP advertisement resource for MetalLB, specifying how an IP address pool is advertised to BGP peers. This enables fine-grained control over traffic routing and announcements. | No |
Load Balancer |
| Defines a BGP peer in MetalLB, representing a BGP neighbor for dynamic routing. | No |
Load Balancer |
| Defines a MetalLB community, which groups one or more BGP communities under a named resource. Communities can be applied to BGP advertisements to control routing policies and change traffic routing. | No |
Load Balancer |
| Defines the MetalLB resource in the cluster. | No |
Load Balancer |
| Defines the metallb-system namespace in the cluster. | No |
Load Balancer |
| Defines the Operator group for the MetalLB Operator. | No |
Load Balancer |
| Creates a subscription resource for the metallb Operator with manual approval for install plans. | No |
Multus - Tap CNI for rootless DPDK pods |
| Configures a MachineConfig resource which sets an SELinux boolean for the tap CNI plugin on worker nodes. | Yes |
NMState Operator |
| Defines an NMState resource that is used by the NMState Operator to manage node network configurations. | No |
NMState Operator |
| Creates the NMState Operator namespace. | No |
NMState Operator |
| Creates the Operator group in the openshift-nmstate namespace, allowing the NMState Operator to watch and manage resources. | No |
NMState Operator |
| Creates a subscription for the NMState Operator, managed through OLM. | No |
SR-IOV Network Operator |
| Defines an SR-IOV network specifying network capabilities, IP address management (ipam), and the associated network namespace and resource. | No |
SR-IOV Network Operator |
| Configures network policies for SR-IOV devices on specific nodes, including customization of device selection, VF allocation (numVfs), node-specific settings (nodeSelector), and priorities. | No |
SR-IOV Network Operator |
| Configures various settings for the SR-IOV Operator, including enabling the injector and Operator webhook, disabling pod draining, and defining the node selector for the configuration daemon. | No |
SR-IOV Network Operator |
| Creates a subscription for the SR-IOV Network Operator, managed through OLM. | No |
SR-IOV Network Operator |
| Creates the SR-IOV Network Operator subscription namespace. | No |
SR-IOV Network Operator |
| Creates the Operator group for the SR-IOV Network Operator, allowing it to watch and manage resources in the target namespace. | No |
3.1.6.5. Scheduling reference CRs
Component | Reference CR | Description | Optional |
---|---|---|---|
NUMA-aware scheduler |
| Enables the NUMA Resources Operator, aligning workloads with specific NUMA node configurations. Required for clusters with multi-NUMA nodes. | No |
NUMA-aware scheduler |
| Creates a subscription for the NUMA Resources Operator, managed through OLM. Required for clusters with multi-NUMA nodes. | No |
NUMA-aware scheduler |
| Creates the NUMA Resources Operator subscription namespace. Required for clusters with multi-NUMA nodes. | No |
NUMA-aware scheduler |
| Creates the Operator group in the numaresources-operator namespace, allowing the NUMA Resources Operator to watch and manage resources. Required for clusters with multi-NUMA nodes. | No |
NUMA-aware scheduler |
| Configures a topology-aware scheduler in the cluster that can handle NUMA aware scheduling of pods across nodes. | No |
NUMA-aware scheduler |
| Configures control plane nodes as non-schedulable for workloads. | No |
3.1.6.6. Storage reference CRs
Component | Reference CR | Description | Optional |
---|---|---|---|
External ODF configuration |
| Defines a Secret resource containing base64-encoded configuration data for an external Ceph cluster in the openshift-storage namespace. | No |
External ODF configuration |
| Defines an OpenShift Container Storage (OCS) storage resource which configures the cluster to use an external storage back end. | No |
External ODF configuration |
| Creates the monitored openshift-storage namespace for the OpenShift Data Foundation Operator. | No |
External ODF configuration |
| Creates the Operator group in the openshift-storage namespace, allowing the OpenShift Data Foundation Operator to watch and manage resources. | No |
External ODF configuration |
| Creates the subscription for the OpenShift Data Foundation Operator in the openshift-storage namespace. | No |
3.1.7. Telco core reference configuration software specifications
The Red Hat telco core 4.18 solution has been validated using the following Red Hat software products for OpenShift Container Platform clusters.
Component | Software version |
---|---|
Red Hat Advanced Cluster Management (RHACM) | 2.121 |
Cluster Logging Operator | 6.12 |
OpenShift Data Foundation | 4.18 |
SR-IOV Network Operator | 4.18 |
MetalLB | 4.18 |
NMState Operator | 4.18 |
NUMA-aware scheduler | 4.18 |
[1] This table will be updated when the aligned RHACM version 2.13 is released.
[2] This table will be updated when the aligned Cluster Logging Operator 6.2 is released.
3.2. Telco RAN DU reference design specifications
The telco RAN DU reference design specifications (RDS) describes the configuration for clusters running on commodity hardware to host 5G workloads in the Radio Access Network (RAN). It captures the recommended, tested, and supported configurations to get reliable and repeatable performance for a cluster running the telco RAN DU profile.
3.2.1. Reference design specifications for telco RAN DU 5G deployments
Red Hat and certified partners offer deep technical expertise and support for networking and operational capabilities required to run telco applications on OpenShift Container Platform 4.18 clusters.
Red Hat’s telco partners require a well-integrated, well-tested, and stable environment that can be replicated at scale for enterprise 5G solutions. The telco core and RAN DU reference design specifications (RDS) outline the recommended solution architecture based on a specific version of OpenShift Container Platform. Each RDS describes a tested and validated platform configuration for telco core and RAN DU use models. The RDS ensures an optimal experience when running your applications by defining the set of critical KPIs for telco 5G core and RAN DU. Following the RDS minimizes high severity escalations and improves application stability.
5G use cases are evolving and your workloads are continually changing. Red Hat is committed to iterating over the telco core and RAN DU RDS to support evolving requirements based on customer and partner feedback.
The reference configuration includes the configuration of the far edge clusters and hub cluster components.
The reference configurations in this document are deployed using a centrally managed hub cluster infrastructure as shown in the following image.
Figure 3.4. Telco RAN DU deployment architecture

3.2.1.1. Reference design scope
The telco core and telco RAN reference design specifications (RDS) capture the recommended, tested, and supported configurations to get reliable and repeatable performance for clusters running the telco core and telco RAN profiles.
Each RDS includes the released features and supported configurations that are engineered and validated for clusters to run the individual profiles. The configurations provide a baseline OpenShift Container Platform installation that meets feature and KPI targets. Each RDS also describes expected variations for each individual configuration. Validation of each RDS includes many long duration and at-scale tests.
The validated reference configurations are updated for each major Y-stream release of OpenShift Container Platform. Z-stream patch releases are periodically re-tested against the reference configurations.
3.2.1.2. Deviations from the reference design
Deviating from the validated telco core and telco RAN DU reference design specifications (RDS) can have significant impact beyond the specific component or feature that you change. Deviations require analysis and engineering in the context of the complete solution.
All deviations from the RDS should be analyzed and documented with clear action tracking information. Due diligence is expected from partners to understand how to bring deviations into line with the reference design. This might require partners to provide additional resources to engage with Red Hat to work towards enabling their use case to achieve a best in class outcome with the platform. This is critical for the supportability of the solution and ensuring alignment across Red Hat and with partners.
Deviation from the RDS can have some or all of the following consequences:
- It can take longer to resolve issues.
- There is a risk of missing project service-level agreements (SLAs), project deadlines, end provider performance requirements, and so on.
Unapproved deviations may require escalation at executive levels.
NoteRed Hat prioritizes the servicing of requests for deviations based on partner engagement priorities.
3.2.1.3. Engineering considerations for the RAN DU use model
The RAN DU use model configures an OpenShift Container Platform cluster running on commodity hardware for hosting RAN distributed unit (DU) workloads. Model and system level considerations are described below. Specific limits, requirements and engineering considerations for individual components are detailed in later sections.
For details of the RAN DU KPI test results, see the Telco RAN DU reference design specification KPI test results for OpenShift 4.18. This information is only available to customers and partners.
- Workloads
- DU workloads are described in "Telco RAN DU application workloads".
- DU worker nodes are Intel 3rd Generation Xeon (IceLake) 2.20 GHz or better with host firmware tuned for maximum performance.
- Resources
- The maximum number of running pods in the system, inclusive of application workload and OpenShift Container Platform pods, is 120.
- Resource utilization
OpenShift Container Platform resource utilization varies depending on many factors such as the following application workload characteristics:
- Pod count
- Type and frequency of probes
- Messaging rates on the primary or secondary CNI with kernel networking
- API access rate
- Logging rates
- Storage IOPS
Resource utilization is measured for clusters configured as follows:
- The cluster is a single host with single-node OpenShift installed.
- The cluster runs the representative application workload described in "Reference application workload characteristics".
- The cluster is managed under the constraints detailed in "Hub cluster management characteristics".
- Components noted as "optional" in the use model configuration are not included.
NoteConfiguration outside the scope of the RAN DU RDS that do not meet these criteria requires additional analysis to determine the impact on resource utilization and ability to meet KPI targets. You might need to allocate additional cluster resources to meet these requirements.
- Reference application workload characteristics
- Uses 15 pods and 30 containers for the vRAN application including its management and control functions
-
Uses an average of 2
ConfigMap
and 4Secret
CRs per pod - Uses a maximum of 10 exec probes with a frequency of not less than 10 seconds
Incremental application load on the kube-apiserver is less than or equal to 10% of the cluster platform usage
NoteYou can extract CPU load can from the platform metrics. For example:
$ query=avg_over_time(pod:container_cpu_usage:sum{namespace="openshift-kube-apiserver"}[30m])
- Application logs are not collected by the platform log collector
- Aggregate traffic on the primary CNI is less than 8 MBps
- Hub cluster management characteristics
RHACM is the recommended cluster management solution and is configured to these limits:
- Use a maximum of 5 RHACM configuration policies with a compliant evaluation interval of not less than 10 minutes.
- Use a minimal number (up to 10) of managed cluster templates in cluster policies. Use hub-side templating.
-
Disable RHACM addons with the exception of the
policyController
and configure observability with the default configuration.
The following table describes resource utilization under reference application load.
Table 3.8. Resource utilization under reference application load Metric Limits Notes OpenShift platform CPU usage
Less than 4000mc – 2 cores (4HT)
Platform CPU is pinned to reserved cores, including both hyper-threads of each reserved core. The system is engineered to 3 CPUs (3000mc) at steady-state to allow for periodic system tasks and spikes.
OpenShift Platform memory
Less than 16G
3.2.1.4. Telco RAN DU application workloads
Develop RAN DU applications that are subject to the following requirements and limitations.
- Description and limits
- Develop cloud-native network functions (CNFs) that conform to the latest version of Red Hat best practices for Kubernetes.
- Use SR-IOV for high performance networking.
Use exec probes sparingly and only when no other suitable options are available.
-
Do not use exec probes if a CNF uses CPU pinning. Use other probe implementations, for example,
httpGet
ortcpSocket
. When you need to use exec probes, limit the exec probe frequency and quantity. The maximum number of exec probes must be kept below 10, and frequency must not be set to less than 10 seconds. Exec probes cause much higher CPU usage on management cores compared to other probe types because they require process forking.
NoteStartup probes require minimal resources during steady-state operation. The limitation on exec probes applies primarily to liveness and readiness probes.
-
Do not use exec probes if a CNF uses CPU pinning. Use other probe implementations, for example,
NoteA test workload that conforms to the dimensions of the reference DU application workload described in this specification can be found at openshift-kni/du-test-workloads.
3.2.2. Telco RAN DU reference design components
The following sections describe the various OpenShift Container Platform components and configurations that you use to configure and deploy clusters to run RAN DU workloads.
Figure 3.5. Telco RAN DU reference design components

Ensure that additional components you include that are not specified in the telco RAN DU profile do not affect the CPU resources allocated to workload applications.
Out of tree drivers are not supported. 5G RAN application components are not included in the RAN DU profile and must be engineered against resources (CPU) allocated to applications.
3.2.2.1. Host firmware tuning
- New in this release
- No reference design updates in this release
- Description
Tune host firmware settings for optimal performance during initial cluster deployment. For more information, see "Recommended single-node OpenShift cluster configuration for vDU application workloads". Apply tuning settings in the host firmware during initial deployment. See "Managing host firmware settings with GitOps ZTP" for more information. The managed cluster host firmware settings are available on the hub cluster as individual
BareMetalHost
custom resources (CRs) that are created when you deploy the managed cluster with theClusterInstance
CR and GitOps ZTP.NoteCreate the
ClusterInstance
CR based on the provided referenceexample-sno.yaml
CR.- Limits and requirements
- You must enable Hyper-Threading in the host firmware settings
- Engineering considerations
- Tune all firmware settings for maximum performance.
- All settings are expected to be for maximum performance unless tuned for power savings.
- You can tune host firmware for power savings at the expense of performance as required.
- Enable secure boot. When secure boot is enabled, only signed kernel modules are loaded by the kernel. Out-of-tree drivers are not supported.
3.2.2.2. CPU partitioning and performance tuning
- New in this release
- No reference design updates in this release
- Description
-
The RAN DU use model includes cluster performance tuning via
PerformanceProfile
CRs for low-latency performance. ThePerformanceProfile
CRs are reconciled by the Node Tuning Operator. The RAN DU use case requires the cluster to be tuned for low-latency performance. For more details about node tuning with thePerformanceProfile
CR, see "Tuning nodes for low latency with the performance profile". - Limits and requirements
The Node Tuning Operator uses the
PerformanceProfile
CR to configure the cluster. You need to configure the following settings in the telco RAN DU profilePerformanceProfile
CR:Set a reserved
cpuset
of 4 or more, equating to 4 hyper-threads (2 cores) for either of the following CPUs:- Intel 3rd Generation Xeon (IceLake) 2.20 GHz or better CPUs with host firmware tuned for maximum performance
AMD EPYC Zen 4 CPUs (Genoa, Bergamo, or newer)
NoteAMD EPYC Zen 4 CPUs (Genoa, Bergamo, or newer) are fully supported. Power consumption evaluations are ongoing. It is recommended to evaluate features, such as per-pod power management, to determine any potential impact on performance.
-
Set the reserved
cpuset
to include both hyper-thread siblings for each included core. Unreserved cores are available as allocatable CPU for scheduling workloads. - Ensure that hyper-thread siblings are not split across reserved and isolated cores.
- Ensure that reserved and isolated CPUs include all the threads for all cores in the CPU.
- Include Core 0 for each NUMA node in the reserved CPU set.
- Set the huge page size to 1G.
- Only pin OpenShift Container Platform pods which are by default configured as part of the management workload partition to reserved cores.
- Engineering considerations
- Meeting the full performance metrics requires use of the RT kernel. If required, you can use the non-RT kernel with corresponding impact to performance.
- The number of hugepages you configure depends on application workload requirements. Variation in this parameter is expected and allowed.
- Variation is expected in the configuration of reserved and isolated CPU sets based on selected hardware and additional components in use on the system. The variation must still meet the specified limits.
- Hardware without IRQ affinity support affects isolated CPUs. To ensure that pods with guaranteed whole CPU QoS have full use of allocated CPUs, all hardware in the server must support IRQ affinity.
-
When workload partitioning is enabled by setting
cpuPartitioningMode
toAllNodes
during deployment, you must allocate enough CPUs to support the operating system, interrupts, and OpenShift Container Platform pods in thePerformanceProfile
CR.
Additional resources
3.2.2.3. PTP Operator
- New in this release
- No reference design updates in this release
- Description
-
Configure PTP in cluster nodes with
PTPConfig
CRs for the RAN DU use case with features like Grandmaster clock (T-GM) support via GPS, ordinary clock (OC), boundary clocks (T-BC), dual boundary clocks, high availability (HA), and optional fast event notification over HTTP. PTP ensures precise timing and reliability in the RAN environment. - Limits and requirements
- Limited to two boundary clocks for nodes with dual NICs and HA
- Limited to two Westport channel NIC configurations for T-GM
- Engineering considerations
- RAN DU RDS configurations are provided for ordinary clocks, boundary clocks, grandmaster clocks, and highly available dual NIC boundary clocks.
-
PTP fast event notifications use
ConfigMap
CRs to persist subscriber details. - Hierarchical event subscription as described in the O-RAN specification is not supported for PTP events.
- Use the PTP fast events REST API v2. The PTP fast events REST API v1 is deprecated. The REST API v2 is O-RAN Release 3 compliant.
3.2.2.4. SR-IOV Operator
- New in this release
- No reference design updates in this release
- Description
-
The SR-IOV Operator provisions and configures the SR-IOV CNI and device plugins. Both
netdevice
(kernel VFs) andvfio
(DPDK) devices are supported and applicable to the RAN DU use models. - Limits and requirements
- Use devices that are supported for OpenShift Container Platform. See "Supported devices".
- SR-IOV and IOMMU enablement in host firmware settings: The SR-IOV Network Operator automatically enables IOMMU on the kernel command line.
- SR-IOV VFs do not receive link state updates from the PF. If link down detection is required you must configure this at the protocol level.
- Engineering considerations
-
SR-IOV interfaces with the
vfio
driver type are typically used to enable additional secondary networks for applications that require high throughput or low latency. -
Customer variation on the configuration and number of
SriovNetwork
andSriovNetworkNodePolicy
custom resources (CRs) is expected. -
IOMMU kernel command line settings are applied with a
MachineConfig
CR at install time. This ensures that theSriovOperator
CR does not cause a reboot of the node when adding them. - SR-IOV support for draining nodes in parallel is not applicable in a single-node OpenShift cluster.
-
You must include the
SriovOperatorConfig
CR in your deployment; the CR is not created automatically. This CR is included in the reference configuration policies which are applied during initial deployment. - In scenarios where you pin or restrict workloads to specific nodes, the SR-IOV parallel node drain feature will not result in the rescheduling of pods. In these scenarios, the SR-IOV Operator disables the parallel node drain functionality.
- NICs which do not support firmware updates under secure boot or kernel lockdown must be pre-configured with sufficient virtual functions (VFs) to support the number of VFs needed by the application workload. For Mellanox NICs, the Mellanox vendor plugin must be disabled in the SR-IOV Network Operator. For more information, see "Configuring an SR-IOV network device".
To change the MTU value of a virtual function after the pod has started, do not configure the MTU field in the
SriovNetworkNodePolicy
CR. Instead, configure the Network Manager or use a custom systemd script to set the MTU of the physical function to an appropriate value. For example:# ip link set dev <physical_function> mtu 9000
-
SR-IOV interfaces with the
Additional resources
3.2.2.5. Logging
- New in this release
- No reference design updates in this release
- Description
- Use logging to collect logs from the far edge node for remote analysis. The recommended log collector is Vector.
- Engineering considerations
- Handling logs beyond the infrastructure and audit logs, for example, from the application workload requires additional CPU and network bandwidth based on additional logging rate.
- As of OpenShift Container Platform 4.14, Vector is the reference log collector. Use of fluentd in the RAN use models is deprecated.
Additional resources
3.2.2.6. SRIOV-FEC Operator
- New in this release
- No reference design updates in this release
- Description
- SRIOV-FEC Operator is an optional 3rd party Certified Operator supporting FEC accelerator hardware.
- Limits and requirements
Starting with FEC Operator v2.7.0:
- Secure boot is supported
-
vfio
drivers for PFs require the usage of avfio-token
that is injected into the pods. Applications in the pod can pass the VF token to DPDK by using EAL parameter--vfio-vf-token
.
- Engineering considerations
- The SRIOV-FEC Operator uses CPU cores from the isolated CPU set.
- You can validate FEC readiness as part of the pre-checks for application deployment, for example, by extending the validation policy.
Additional resources
3.2.2.7. Lifecycle Agent
- New in this release
- No reference design updates in this release
- Description
- The Lifecycle Agent provides local lifecycle management services for single-node OpenShift clusters.
- Limits and requirements
- The Lifecycle Agent is not applicable in multi-node clusters or single-node OpenShift clusters with an additional worker.
- The Lifecycle Agent requires a persistent volume that you create when installing the cluster. For descriptions of partition requirements, see "Configuring a shared container directory between ostree stateroots when using GitOps ZTP".
3.2.2.8. Local Storage Operator
- New in this release
- No reference design updates in this release
- Description
-
You can create persistent volumes that can be used as
PVC
resources by applications with the Local Storage Operator. The number and type ofPV
resources that you create depends on your requirements. - Engineering considerations
-
Create backing storage for
PV
CRs before creating thePV
. This can be a partition, a local volume, LVM volume, or full disk. -
Refer to the device listing in
LocalVolume
CRs by the hardware path used to access each device to ensure correct allocation of disks and partitions, for example,/dev/disk/by-path/<id>
. Logical names (for example,/dev/sda
) are not guaranteed to be consistent across node reboots.
-
Create backing storage for
3.2.2.9. Logical Volume Manager Storage
- New in this release
- No reference design updates in this release
- Description
-
Logical Volume Manager (LVM) Storage is an optional component. It provides dynamic provisioning of both block and file storage by creating logical volumes from local devices that can be consumed as persistent volume claim (PVC) resources by applications. Volume expansion and snapshots are also possible. An example configuration is provided in the RDS with the
StorageLVMCluster.yaml
file. - Limits and requirements
- In single-node OpenShift clusters, persistent storage must be provided by either LVM Storage or local storage, not both.
- Volume snapshots are excluded from the reference configuration.
- Engineering considerations
- LVM Storage can be used as the local storage implementation for the RAN DU use case. When LVM Storage is used as the storage solution, it replaces the Local Storage Operator, and the CPU required is assigned to the management partition as platform overhead. The reference configuration must include one of these storage solutions but not both.
- Ensure that sufficient disks or partitions are available for storage requirements.
3.2.2.10. Workload partitioning
- New in this release
- No reference design updates in this release
- Description
-
Workload partitioning pins OpenShift Container Platform and Day 2 Operator pods that are part of the DU profile to the reserved CPU set and removes the reserved CPU from node accounting. This leaves all unreserved CPU cores available for user workloads. This leaves all non-reserved CPU cores available for user workloads. Workload partitioning is enabled through a capability set in installation parameters:
cpuPartitioningMode: AllNodes
. The set of management partition cores are set with the reserved CPU set that you configure in thePerformanceProfile
CR. - Limits and requirements
-
Namespace
andPod
CRs must be annotated to allow the pod to be applied to the management partition - Pods with CPU limits cannot be allocated to the partition. This is because mutation can change the pod QoS.
- For more information about the minimum number of CPUs that can be allocated to the management partition, see "Node Tuning Operator".
-
- Engineering considerations
- Workload partitioning pins all management pods to reserved cores. A sufficient number of cores must be allocated to the reserved set to account for operating system, management pods, and expected spikes in CPU use that occur when the workload starts, the node reboots, or other system events happen.
Additional resources
3.2.2.11. Cluster tuning
- New in this release
- No reference design updates in this release
- Description
- See "Cluster capabilities" for a full list of components that can be disabled by using the cluster capabilities feature.
- Limits and requirements
- Cluster capabilities are not available for installer-provisioned installation methods.
- Engineering considerations
In clusters running OpenShift Container Platform 4.16 and later, the cluster does not automatically revert to cgroup v1 when a
PerformanceProfile
is applied. If workloads running on the cluster require cgroup v1, the cluster must be configured for cgroup v1. For more information, see "Enabling Linux control group version 1 (cgroup v1)". You should make this configuration as part of the initial cluster deployment.NoteSupport for cgroup v1 is planned for removal in OpenShift Container Platform 4.19. Clusters running cgroup v1 must transition to cgroup v2.
The following table lists the required platform tuning configurations:
Feature | Description |
---|---|
Remove optional cluster capabilities | Reduce the OpenShift Container Platform footprint by disabling optional cluster Operators on single-node OpenShift clusters only.
|
Configure cluster monitoring | Configure the monitoring stack for reduced footprint by doing the following:
|
Disable networking diagnostics | Disable networking diagnostics for single-node OpenShift because they are not required. |
Configure a single OperatorHub catalog source |
Configure the cluster to use a single catalog source that contains only the Operators required for a RAN DU deployment. Each catalog source increases the CPU use on the cluster. Using a single |
Disable the Console Operator |
If the cluster was deployed with the console disabled, the |
Additional resources
3.2.2.12. Machine configuration
- New in this release
- No reference design updates in this release
- Limits and requirements
-
The CRI-O wipe disable
MachineConfig
CR assumes that images on disk are static other than during scheduled maintenance in defined maintenance windows. To ensure the images are static, do not set the podimagePullPolicy
field toAlways
.
Feature | Description |
---|---|
Container Runtime |
Sets the container runtime to |
Kubelet config and container mount namespace hiding | Reduces the frequency of kubelet housekeeping and eviction monitoring, which reduces CPU usage |
SCTP | Optional configuration (enabled by default) |
Kdump | Optional configuration (enabled by default) Enables kdump to capture debug information when a kernel panic occurs. The reference CRs that enable kdump have an increased memory reservation based on the set of drivers and kernel modules included in the reference configuration. |
CRI-O wipe disable | Disables automatic wiping of the CRI-O image cache after unclean shutdown |
SR-IOV-related kernel arguments | Include additional SR-IOV-related arguments in the kernel command line |
Set RCU Normal |
Systemd service that sets |
One-shot time sync | Runs a one-time NTP system time synchronization job for control plane or worker nodes. |
Additional resources
3.2.3. Telco RAN DU deployment components
The following sections describe the various OpenShift Container Platform components and configurations that you use to configure the hub cluster with RHACM.
3.2.3.1. Red Hat Advanced Cluster Management
- New in this release
- No reference design updates in this release
- Description
RHACM provides Multi Cluster Engine (MCE) installation and ongoing lifecycle management functionality for deployed clusters. You manage cluster configuration and upgrades declaratively by applying
Policy
custom resources (CRs) to clusters during maintenance windows.RHACM provides the following functionality:
- Zero touch provisioning (ZTP) of clusters using the MCE component in RHACM.
- Configuration, upgrades, and cluster status through the RHACM policy controller.
-
During managed cluster installation, RHACM can apply labels to individual nodes as configured through the
ClusterInstance
CR.
- Limits and requirements
-
A single hub cluster supports up to 3500 deployed single-node OpenShift clusters with 5
Policy
CRs bound to each cluster.
-
A single hub cluster supports up to 3500 deployed single-node OpenShift clusters with 5
- Engineering considerations
- Use RHACM policy hub-side templating to better scale cluster configuration. You can significantly reduce the number of policies by using a single group policy or small number of general group policies where the group and per-cluster values are substituted into templates.
-
Cluster specific configuration: managed clusters typically have some number of configuration values that are specific to the individual cluster. These configurations should be managed using RHACM policy hub-side templating with values pulled from
ConfigMap
CRs based on the cluster name. - To save CPU resources on managed clusters, policies that apply static configurations should be unbound from managed clusters after GitOps ZTP installation of the cluster.
3.2.4. SiteConfig Operator
- New in this release
- No RDS updates in this release
- Description
The SiteConfig Operator is a template-driven solution designed to provision clusters through various installation methods. It introduces the unified
ClusterInstance
API, which replaces the deprecatedSiteConfig
API. By leveraging theClusterInstance
API, the SiteConfig Operator improves cluster provisioning by providing the following:- Better isolation of definitions from installation methods
- Unification of Git and non-Git workflows
- Consistent APIs across installation methods
- Enhanced scalability
- Increased flexibility with custom installation templates
- Valuable insights for troubleshooting deployment issues
The SiteConfig Operator provides validated default installation templates to facilitate cluster deployment through both the Assisted Installer and Image-based Installer provisioning methods:
- Assisted Installer automates the deployment of OpenShift Container Platform clusters by leveraging predefined configurations and validated host setups. It ensures that the target infrastructure meets OpenShift Container Platform requirements. The Assisted Installer streamlines the installation process while minimizing time and complexity compared to manual setup.
- Image-based Installer expedites the deployment of single-node OpenShift clusters by utilizing preconfigured and validated OpenShift Container Platform seed images. Seed images are preinstalled on target hosts, enabling rapid reconfiguration and deployment. The Image-based Installer is particularly well-suited for remote or disconnected environments, because it simplifies the cluster creation process and significantly reduces deployment time.
- Limits and requirements
- A single hub cluster supports up to 3500 deployed single-node OpenShift clusters.
3.2.4.1. Topology Aware Lifecycle Manager
- New in this release
- No reference design updates in this release
- Description
Topology Aware Lifecycle Manager is an Operator that runs only on the hub cluster for managing how changes like cluster upgrades, Operator upgrades, and cluster configuration are rolled out to the network. TALM supports the following features:
- Progressive rollout of policy updates to fleets of clusters in user configurable batches.
-
Per-cluster actions add
ztp-done
labels or other user-configurable labels following configuration changes to managed clusters. Precaching of single-node OpenShift clusters images: TALM supports optional pre-caching of OpenShift, OLM Operator, and additional user images to single-node OpenShift clusters before initiating an upgrade. The precaching feature is not applicable when using the recommended image-based upgrade method for upgrading single-node OpenShift clusters.
-
Specifying optional pre-caching configurations with
PreCachingConfig
CRs. Review the sample referencePreCachingConfig
CR for more information. - Excluding unused images with configurable filtering.
- Enabling before and after pre-caching storage space validations with configurable space-required parameters.
-
Specifying optional pre-caching configurations with
- Limits and requirements
- Supports concurrent cluster deployment in batches of 400
- Pre-caching and backup are limited to single-node OpenShift clusters only
- Engineering considerations
-
The
PreCachingConfig
CR is optional and does not need to be created if you only need to precache platform-related OpenShift and OLM Operator images. -
The
PreCachingConfig
CR must be applied before referencing it in theClusterGroupUpgrade
CR. -
Only policies with the
ran.openshift.io/ztp-deploy-wave
annotation are automatically applied by TALM during cluster installation. -
Any policy can be remediated by TALM under control of a user created
ClusterGroupUpgrade
CR.
-
The
Additional resources
3.2.4.2. GitOps Operator and GitOps ZTP
- New in this release
- No reference design updates in this release
- Description
GitOps Operator and GitOps ZTP provide a GitOps-based infrastructure for managing cluster deployment and configuration. Cluster definitions and configurations are maintained as a declarative state in Git. You can apply
ClusterInstance
CRs to the hub cluster where theSiteConfig
Operator renders them as installation CRs. In earlier releases, a GitOps ZTP plugin supported the generation of installation CRs fromSiteConfig
CRs. This plugin is now deprecated. A separate GitOps ZTP plugin is available to enable automatic wrapping of configuration CRs into policies based on thePolicyGenerator
orPolicyGenTemplate
CR.You can deploy and manage multiple versions of OpenShift Container Platform on managed clusters by using the baseline reference configuration CRs. You can use custom CRs alongside the baseline CRs. To maintain multiple per-version policies simultaneously, use Git to manage the versions of the source and policy CRs by using
PolicyGenerator
orPolicyGenTemplate
CRs.- Limits and requirements
-
300
ClusterInstance
CRs per ArgoCD application. Multiple applications can be used to achieve the maximum number of clusters supported by a single hub cluster -
Content in the
source-crs/
directory in Git overrides content provided in the ZTP plugin container, as Git takes precedence in the search path. -
The
source-crs/
directory is specifically expected to be located in the same directory as thekustomization.yaml
file, which includesPolicyGenerator
orPolicyGenTemplate
CRs as a generator. Alternative locations for thesource-crs/
directory are not supported in this context.
-
300
- Engineering considerations
-
For multi-node cluster upgrades, you can pause
MachineConfigPool
(MCP
) CRs during maintenance windows by setting thepaused
field totrue
. You can increase the number of simultaneously updated nodes perMCP
CR by configuring themaxUnavailable
setting in theMCP
CR. TheMaxUnavailable
field defines the percentage of nodes in the pool that can be simultaneously unavailable during aMachineConfig
update. SetmaxUnavailable
to the maximum tolerable value. This reduces the number of reboots in a cluster during upgrades which results in shorter upgrade times. When you finally unpause theMCP
CR, all the changed configurations are applied with a single reboot. -
During cluster installation, you can pause custom MCP CRs by setting the paused field to true and setting
maxUnavailable
to 100% to improve installation times. Keep reference CRs and custom CRs under different directories. Doing this allows you to patch and update the reference CRs by simple replacement of all directory contents without touching the custom CRs. When managing multiple versions, the following best practices are recommended:
- Keep all source CRs and policy creation CRs in Git repositories to ensure consistent generation of policies for each OpenShift Container Platform version based solely on the contents in Git.
- Keep reference source CRs in a separate directory from custom CRs. This facilitates easy update of reference CRs as required.
-
To avoid confusion or unintentional overwrites when updating content, it is highly recommended to use unique and distinguishable names for custom CRs in the
source-crs/
directory and extra manifests in Git. -
Extra installation manifests are referenced in the
ClusterInstance
CR through aConfigMap
CR. TheConfigMap
CR should be stored alongside theClusterInstance
CR in Git, serving as the single source of truth for the cluster. If needed, you can use aConfigMap
generator to create theConfigMap
CR.
-
For multi-node cluster upgrades, you can pause
3.2.4.3. Agent-based Installer
- New in this release
- No reference design updates in this release
- Description
- The optional Agent-based Installer component provides installation capabilities without centralized infrastructure. The installation program creates an ISO image that you mount to the server. When the server boots it installs OpenShift Container Platform and supplied extra manifests. The Agent-based Installer allows you to install OpenShift Container Platform without a hub cluster. A container image registry is required for cluster installation.
- Limits and requirements
- You can supply a limited set of additional manifests at installation time.
-
You must include
MachineConfiguration
CRs that are required by the RAN DU use case.
- Engineering considerations
- The Agent-based Installer provides a baseline OpenShift Container Platform installation.
- You install Day 2 Operators and the remainder of the RAN DU use case configurations after installation.
Additional resources
3.2.5. Telco RAN DU reference configuration CRs
Use the following custom resources (CRs) to configure and deploy OpenShift Container Platform clusters with the telco RAN DU profile. Use the CRs to form the common baseline used in all the specific use models unless otherwise indicated.
You can extract the complete set of RAN DU CRs from the ztp-site-generate
container image. See Preparing the GitOps ZTP site configuration repository for more information.
Additional resources
3.2.5.1. Cluster tuning reference CRs
Component | Reference CR | Description | Optional |
---|---|---|---|
Cluster capabilities |
| Representative SiteConfig CR to install single-node OpenShift with the RAN DU profile | No |
Console disable |
| Disables the Console Operator. | No |
Disconnected registry |
| Defines a dedicated namespace for managing the OpenShift Operator Marketplace. | No |
Disconnected registry |
| Configures the catalog source for the disconnected registry. | No |
Disconnected registry |
| Disables performance profiling for OLM. | No |
Disconnected registry |
| Configures disconnected registry image content source policy. | No |
Disconnected registry |
| Optional, for multi-node clusters only. Configures the OperatorHub in OpenShift, disabling all default Operator sources. Not required for single-node OpenShift installs with marketplace capability disabled. | No |
Monitoring configuration |
| Reduces the monitoring footprint by disabling Alertmanager and Telemeter, and sets Prometheus retention to 24 hours | No |
Network diagnostics disable |
| Configures the cluster network settings to disable built-in network troubleshooting and diagnostic features. | No |
3.2.5.2. Day 2 Operators reference CRs
Component | Reference CR | Description | Optional |
---|---|---|---|
Cluster Logging Operator |
| Configures log forwarding for the cluster. | No |
Cluster Logging Operator |
| Configures the namespace for cluster logging. | No |
Cluster Logging Operator |
| Configures Operator group for cluster logging. | No |
Cluster Logging Operator |
| New in 4.18. Configures the cluster logging service account. | No |
Cluster Logging Operator |
| New in 4.18. Configures the cluster logging service account. | No |
Cluster Logging Operator |
| New in 4.18. Configures the cluster logging service account. | No |
Cluster Logging Operator |
| Manages installation and updates for the Cluster Logging Operator. | No |
Lifecycle Agent |
| Manage the image-based upgrade process in OpenShift. | Yes |
Lifecycle Agent |
| Manages installation and updates for the LCA Operator. | Yes |
Lifecycle Agent |
| Configures namespace for LCA subscription. | Yes |
Lifecycle Agent |
| Configures the Operator group for the LCA subscription. | Yes |
Local Storage Operator |
| Defines a storage class with a Delete reclaim policy and no dynamic provisioning in the cluster. | No |
Local Storage Operator |
| Configures local storage devices for the example-storage-class in the openshift-local-storage namespace, specifying device paths and filesystem type. | No |
Local Storage Operator |
| Creates the namespace with annotations for workload management and the deployment wave for the Local Storage Operator. | No |
Local Storage Operator |
| Creates the Operator group for the Local Storage Operator. | No |
Local Storage Operator |
| Creates the namespace for the Local Storage Operator with annotations for workload management and deployment wave. | No |
LVM Operator |
| Verifies the installation or upgrade of the LVM Storage Operator. | Yes |
LVM Operator |
| Defines an LVM cluster configuration, with placeholders for storage device classes and volume group settings. Optional substitute for the Local Storage Operator. | No |
LVM Operator |
| Manages installation and updates of the LVMS Operator. Optional substitute for the Local Storage Operator. | No |
LVM Operator |
| Creates the namespace for the LVMS Operator with labels and annotations for cluster monitoring and workload management. Optional substitute for the Local Storage Operator. | No |
LVM Operator |
| Defines the target namespace for the LVMS Operator. Optional substitute for the Local Storage Operator. | No |
Node Tuning Operator |
| Configures node performance settings in an OpenShift cluster, optimizing for low latency and real-time workloads. | No |
Node Tuning Operator |
| Applies performance tuning settings, including scheduler groups and service configurations for nodes in the specific namespace. | No |
PTP fast event notifications |
| Configures PTP settings for PTP boundary clocks with additional options for event synchronization. Dependent on cluster role. | No |
PTP fast event notifications |
| Configures PTP for highly available boundary clocks with additional PTP fast event settings. Dependent on cluster role. | No |
PTP fast event notifications |
| Configures PTP for PTP grandmaster clocks with additional PTP fast event settings. Dependent on cluster role. | No |
PTP fast event notifications |
| Configures PTP for PTP ordinary clocks with additional PTP fast event settings. Dependent on cluster role. | No |
PTP fast event notifications |
| Overrides the default OperatorConfig. Configures the PTP Operator specifying node selection criteria for running PTP daemons in the openshift-ptp namespace. | No |
PTP Operator |
| Configures PTP settings for PTP boundary clocks. Dependent on cluster role. | No |
PTP Operator |
| Configures PTP grandmaster clock settings for hosts that have dual NICs. Dependent on cluster role. | No |
PTP Operator |
| Configures PTP grandmaster clock settings for hosts that have a single NIC. Dependent on cluster role. | No |
PTP Operator |
| Configures PTP settings for a PTP ordinary clock. Dependent on cluster role. | No |
PTP Operator |
| Configures the PTP Operator settings, specifying node selection criteria for running PTP daemons in the openshift-ptp namespace. | No |
PTP Operator |
| Manages installation and updates of the PTP Operator in the openshift-ptp namespace. | No |
PTP Operator |
| Configures the namespace for the PTP Operator. | No |
PTP Operator |
| Configures the Operator group for the PTP Operator. | No |
PTP Operator (high availability) |
| Configures PTP settings for highly available PTP boundary clocks. | No |
PTP Operator (high availability) |
| Configures PTP settings for highly available PTP boundary clocks. | No |
SR-IOV FEC Operator |
| Configures namespace for the VRAN Acceleration Operator. Optional part of application workload. | Yes |
SR-IOV FEC Operator |
| Configures the Operator group for the VRAN Acceleration Operator. Optional part of application workload. | Yes |
SR-IOV FEC Operator |
| Manages installation and updates for the VRAN Acceleration Operator. Optional part of application workload. | Yes |
SR-IOV FEC Operator |
| Configures SR-IOV FPGA Ethernet Controller (FEC) settings for nodes, specifying drivers, VF amount, and node selection. | Yes |
SR-IOV Operator |
| Defines an SR-IOV network configuration, with placeholders for various network settings. | No |
SR-IOV Operator |
| Configures SR-IOV network settings for specific nodes, including device type, RDMA support, physical function names, and the number of virtual functions. | No |
SR-IOV Operator |
| Configures SR-IOV Network Operator settings, including node selection, injector, and webhook options. | No |
SR-IOV Operator |
| Configures the SR-IOV Network Operator settings for single-node OpenShift, including node selection, injector, webhook options, and disabling node drain, in the openshift-sriov-network-operator namespace. | No |
SR-IOV Operator |
| Manages the installation and updates of the SR-IOV Network Operator. | No |
SR-IOV Operator |
| Creates the namespace for the SR-IOV Network Operator with specific annotations for workload management and deployment waves. | No |
SR-IOV Operator |
| Defines the target namespace for the SR-IOV Network Operators, enabling their management and deployment within this namespace. | No |
3.2.5.3. Machine configuration reference CRs
Component | Reference CR | Description | Optional |
---|---|---|---|
Container runtime (crun) |
| Configures the container runtime (crun) for control plane nodes. | No |
Container runtime (crun) |
| Configures the container runtime (crun) for worker nodes. | No |
CRI-O wipe disable |
| Disables automatic CRI-O cache wipe following a reboot for on control plane nodes. | No |
CRI-O wipe disable |
| Disables automatic CRI-O cache wipe following a reboot for on worker nodes. | No |
Kdump enable |
| Configures kdump crash reporting on master nodes. | No |
Kdump enable |
| Configures kdump crash reporting on worker nodes. | No |
Kubelet configuration and container mount hiding |
| Configures a mount namespace for sharing container-specific mounts between kubelet and CRI-O on control plane nodes. | No |
Kubelet configuration and container mount hiding |
| Configures a mount namespace for sharing container-specific mounts between kubelet and CRI-O on worker nodes. | No |
One-shot time sync |
| Synchronizes time once on master nodes. | No |
One-shot time sync |
| Synchronizes time once on worker nodes. | No |
SCTP |
| Loads the SCTP kernel module on master nodes. | Yes |
SCTP |
| Loads the SCTP kernel module on worker nodes. | Yes |
Set RCU normal |
| Disables rcu_expedited by setting rcu_normal after the control plane node has booted. | No |
Set RCU normal |
| Disables rcu_expedited by setting rcu_normal after the worker node has booted. | No |
SRIOV-related kernel arguments |
| Enables SR-IOV support on master nodes. | No |
3.2.6. Comparing a cluster with the telco RAN DU reference configuration
After you deploy a telco RAN DU cluster, you can use the cluster-compare
plugin to assess the cluster’s compliance with the telco RAN DU reference design specifications (RDS). The cluster-compare
plugin is an OpenShift CLI (oc
) plugin. The plugin uses a telco RAN DU reference configuration to validate the cluster with the telco RAN DU custom resources (CRs).
The plugin-specific reference configuration for telco RAN DU is packaged in a container image with the telco RAN DU CRs.
For further information about the cluster-compare
plugin, see "Understanding the cluster-compare plugin".
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. -
You have credentials to access the
registry.redhat.io
container image registry. -
You installed the
cluster-compare
plugin.
Procedure
Login to the container image registry with your credentials by running the following command:
$ podman login registry.redhat.io
Extract the content from the
ztp-site-generate-rhel8
container image by running the following commands::$ podman pull registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.18
$ mkdir -p ./out
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.18 extract /home/ztp --tar | tar x -C ./out
Compare the configuration for your cluster to the reference configuration by running the following command:
$ oc cluster-compare -r out/reference/metadata.yaml
Example output
... ********************************** Cluster CR: config.openshift.io/v1_OperatorHub_cluster 1 Reference File: required/other/operator-hub.yaml 2 Diff Output: diff -u -N /tmp/MERGED-2801470219/config-openshift-io-v1_operatorhub_cluster /tmp/LIVE-2569768241/config-openshift-io-v1_operatorhub_cluster --- /tmp/MERGED-2801470219/config-openshift-io-v1_operatorhub_cluster 2024-12-12 14:13:22.898756462 +0000 +++ /tmp/LIVE-2569768241/config-openshift-io-v1_operatorhub_cluster 2024-12-12 14:13:22.898756462 +0000 @@ -1,6 +1,6 @@ apiVersion: config.openshift.io/v1 kind: OperatorHub metadata: + annotations: 3 + include.release.openshift.io/hypershift: "true" name: cluster -spec: - disableAllDefaultSources: true ********************************** Summary 4 CRs with diffs: 11/12 5 CRs in reference missing from the cluster: 40 6 optional-image-registry: image-registry: Missing CRs: 7 - optional/image-registry/ImageRegistryPV.yaml optional-ptp-config: ptp-config: One of the following is required: - optional/ptp-config/PtpConfigBoundary.yaml - optional/ptp-config/PtpConfigGmWpc.yaml - optional/ptp-config/PtpConfigDualCardGmWpc.yaml - optional/ptp-config/PtpConfigForHA.yaml - optional/ptp-config/PtpConfigMaster.yaml - optional/ptp-config/PtpConfigSlave.yaml - optional/ptp-config/PtpConfigSlaveForEvent.yaml - optional/ptp-config/PtpConfigForHAForEvent.yaml - optional/ptp-config/PtpConfigMasterForEvent.yaml - optional/ptp-config/PtpConfigBoundaryForEvent.yaml ptp-operator-config: One of the following is required: - optional/ptp-config/PtpOperatorConfig.yaml - optional/ptp-config/PtpOperatorConfigForEvent.yaml optional-storage: storage: Missing CRs: - optional/local-storage-operator/StorageLV.yaml ... No CRs are unmatched to reference CRs 8 Metadata Hash: 09650c31212be9a44b99315ec14d2e7715ee194a5d68fb6d24f65fd5ddbe3c3c 9 No patched CRs 10
- 1 1
- The CR under comparison. The plugin displays each CR with a difference from the corresponding template.
- 2
- The template matching with the CR for comparison.
- 3
- The output in Linux diff format shows the difference between the template and the cluster CR.
- 4
- After the plugin reports the line diffs for each CR, the summary of differences are reported.
- 5
- The number of CRs in the comparison with differences from the corresponding templates.
- 6
- The number of CRs represented in the reference configuration, but missing from the live cluster.
- 7
- The list of CRs represented in the reference configuration, but missing from the live cluster.
- 8
- The CRs that did not match to a corresponding template in the reference configuration.
- 9
- The metadata hash identifies the reference configuration.
- 10
- The list of patched CRs.
3.2.7. Telco RAN DU 4.18 validated software components
The Red Hat telco RAN DU 4.18 solution has been validated using the following Red Hat software products for OpenShift Container Platform managed clusters.
Component | Software version |
---|---|
Managed cluster version | 4.18 |
Cluster Logging Operator | 6.11 |
Local Storage Operator | 4.18 |
OpenShift API for Data Protection (OADP) | 1.4 |
PTP Operator | 4.18 |
SR-IOV Operator | 4.18 |
SRIOV-FEC Operator | 2.10 |
Lifecycle Agent | 4.18 |
[1] This table will be updated when the aligned Cluster Logging Operator version 6.2 is released.
3.2.8. Telco RAN DU 4.18 hub cluster validated software components
The Red Hat telco RAN 4.18 solution has been validated using the following Red Hat software products for OpenShift Container Platform hub clusters.
Component | Software version |
---|---|
Hub cluster version | 4.18 |
Red Hat Advanced Cluster Management (RHACM) | 2.121 |
Red Hat OpenShift GitOps | 1.14 |
GitOps ZTP site generate plugins | 4.18 |
Topology Aware Lifecycle Manager (TALM) | 4.18 |
[1] This table will be updated when the aligned RHACM version 2.13 is released.