Edge computing
Configure and deploy OpenShift Container Platform clusters at the network edge
Abstract
Chapter 1. Challenges of the network far edge
Edge computing presents complex challenges when managing many sites in geographically displaced locations. Use GitOps Zero Touch Provisioning (ZTP) to provision and manage sites at the far edge of the network.
1.1. Overcoming the challenges of the network far edge
Today, service providers want to deploy their infrastructure at the edge of the network. This presents significant challenges:
- How do you handle deployments of many edge sites in parallel?
- What happens when you need to deploy sites in disconnected environments?
- How do you manage the lifecycle of large fleets of clusters?
GitOps Zero Touch Provisioning (ZTP) and GitOps meets these challenges by allowing you to provision remote edge sites at scale with declarative site definitions and configurations for bare-metal equipment. Template or overlay configurations install OpenShift Container Platform features that are required for CNF workloads. The full lifecycle of installation and upgrades is handled through the GitOps ZTP pipeline.
GitOps ZTP uses GitOps for infrastructure deployments. With GitOps, you use declarative YAML files and other defined patterns stored in Git repositories. Red Hat Advanced Cluster Management (RHACM) uses your Git repositories to drive the deployment of your infrastructure.
GitOps provides traceability, role-based access control (RBAC), and a single source of truth for the desired state of each site. Scalability issues are addressed by Git methodologies and event driven operations through webhooks.
You start the GitOps ZTP workflow by creating declarative site definition and configuration custom resources (CRs) that the GitOps ZTP pipeline delivers to the edge nodes.
The following diagram shows how GitOps ZTP works within the far edge framework.
1.2. Using GitOps ZTP to provision clusters at the network far edge
Red Hat Advanced Cluster Management (RHACM) manages clusters in a hub-and-spoke architecture, where a single hub cluster manages many spoke clusters. Hub clusters running RHACM provision and deploy the managed clusters by using GitOps Zero Touch Provisioning (ZTP) and the assisted service that is deployed when you install RHACM.
The assisted service handles provisioning of OpenShift Container Platform on single node clusters, three-node clusters, or standard clusters running on bare metal.
A high-level overview of using GitOps ZTP to provision and maintain bare-metal hosts with OpenShift Container Platform is as follows:
- A hub cluster running RHACM manages an OpenShift image registry that mirrors the OpenShift Container Platform release images. RHACM uses the OpenShift image registry to provision the managed clusters.
- You manage the bare-metal hosts in a YAML format inventory file, versioned in a Git repository.
- You make the hosts ready for provisioning as managed clusters, and use RHACM and the assisted service to install the bare-metal hosts on site.
Installing and deploying the clusters is a two-stage process, involving an initial installation phase, and a subsequent configuration and deployment phase. The following diagram illustrates this workflow:
1.3. Installing managed clusters with SiteConfig resources and RHACM
GitOps Zero Touch Provisioning (ZTP) uses SiteConfig
custom resources (CRs) in a Git repository to manage the processes that install OpenShift Container Platform clusters. The SiteConfig
CR contains cluster-specific parameters required for installation. It has options for applying select configuration CRs during installation including user defined extra manifests.
The GitOps ZTP plugin processes SiteConfig
CRs to generate a collection of CRs on the hub cluster. This triggers the assisted service in Red Hat Advanced Cluster Management (RHACM) to install OpenShift Container Platform on the bare-metal host. You can find installation status and error messages in these CRs on the hub cluster.
You can provision single clusters manually or in batches with GitOps ZTP:
- Provisioning a single cluster
-
Create a single
SiteConfig
CR and related installation and configuration CRs for the cluster, and apply them in the hub cluster to begin cluster provisioning. This is a good way to test your CRs before deploying on a larger scale. - Provisioning many clusters
-
Install managed clusters in batches of up to 400 by defining
SiteConfig
and related CRs in a Git repository. ArgoCD uses theSiteConfig
CRs to deploy the sites. The RHACM policy generator creates the manifests and applies them to the hub cluster. This starts the cluster provisioning process.
1.4. Configuring managed clusters with policies and PolicyGenTemplate resources
GitOps Zero Touch Provisioning (ZTP) uses Red Hat Advanced Cluster Management (RHACM) to configure clusters by using a policy-based governance approach to applying the configuration.
The policy generator or PolicyGen
is a plugin for the GitOps Operator that enables the creation of RHACM policies from a concise template. The tool can combine multiple CRs into a single policy, and you can generate multiple policies that apply to various subsets of clusters in your fleet.
For scalability and to reduce the complexity of managing configurations across the fleet of clusters, use configuration CRs with as much commonality as possible.
- Where possible, apply configuration CRs using a fleet-wide common policy.
- The next preference is to create logical groupings of clusters to manage as much of the remaining configurations as possible under a group policy.
- When a configuration is unique to an individual site, use RHACM templating on the hub cluster to inject the site-specific data into a common or group policy. Alternatively, apply an individual site policy for the site.
The following diagram shows how the policy generator interacts with GitOps and RHACM in the configuration phase of cluster deployment.
For large fleets of clusters, it is typical for there to be a high-level of consistency in the configuration of those clusters.
The following recommended structuring of policies combines configuration CRs to meet several goals:
- Describe common configurations once and apply to the fleet.
- Minimize the number of maintained and managed policies.
- Support flexibility in common configurations for cluster variants.
Policy category | Description |
---|---|
Common |
A policy that exists in the common category is applied to all clusters in the fleet. Use common |
Groups |
A policy that exists in the groups category is applied to a group of clusters in the fleet. Use group |
Sites | A policy that exists in the sites category is applied to a specific cluster site. Any cluster can have its own specific policies maintained. |
Using PolicyGenTemplate
CRs to manage and deploy polices to managed clusters will be deprecated in an upcoming OpenShift Container Platform release. Equivalent and improved functionality is available using Red Hat Advanced Cluster Management (RHACM) and PolicyGenerator
CRs.
For more information about PolicyGenerator
resources, see the RHACM Policy Generator documentation.
Chapter 2. Preparing the hub cluster for GitOps ZTP
To use RHACM in a disconnected environment, create a mirror registry that mirrors the OpenShift Container Platform release images and Operator Lifecycle Manager (OLM) catalog that contains the required Operator images. OLM manages, installs, and upgrades Operators and their dependencies in the cluster. You can also use a disconnected mirror host to serve the RHCOS ISO and RootFS disk images that are used to provision the bare-metal hosts.
2.1. Telco RAN DU 4.17 validated software components
The Red Hat telco RAN DU 4.17 solution has been validated using the following Red Hat software products for OpenShift Container Platform managed clusters and hub clusters.
Component | Software version |
---|---|
Managed cluster version | 4.17 |
Cluster Logging Operator | 6.0 |
Local Storage Operator | 4.17 |
OpenShift API for Data Protection (OADP) | 1.4.1 |
PTP Operator | 4.17 |
SRIOV Operator | 4.17 |
SRIOV-FEC Operator | 2.9 |
Lifecycle Agent | 4.17 |
Component | Software version |
---|---|
Hub cluster version | 4.17 |
Red Hat Advanced Cluster Management (RHACM) | 2.11 |
GitOps ZTP plugin | 4.17 |
Red Hat OpenShift GitOps | 1.13 |
Topology Aware Lifecycle Manager (TALM) | 4.17 |
2.2. Recommended hub cluster specifications and managed cluster limits for GitOps ZTP
With GitOps Zero Touch Provisioning (ZTP), you can manage thousands of clusters in geographically dispersed regions and networks. The Red Hat Performance and Scale lab successfully created and managed 3500 virtual single-node OpenShift clusters with a reduced DU profile from a single Red Hat Advanced Cluster Management (RHACM) hub cluster in a lab environment.
In real-world situations, the scaling limits for the number of clusters that you can manage will vary depending on various factors affecting the hub cluster. For example:
- Hub cluster resources
- Available hub cluster host resources (CPU, memory, storage) are an important factor in determining how many clusters the hub cluster can manage. The more resources allocated to the hub cluster, the more managed clusters it can accommodate.
- Hub cluster storage
- The hub cluster host storage IOPS rating and whether the hub cluster hosts use NVMe storage can affect hub cluster performance and the number of clusters it can manage.
- Network bandwidth and latency
- Slow or high-latency network connections between the hub cluster and managed clusters can impact how the hub cluster manages multiple clusters.
- Managed cluster size and complexity
- The size and complexity of the managed clusters also affects the capacity of the hub cluster. Larger managed clusters with more nodes, namespaces, and resources require additional processing and management resources. Similarly, clusters with complex configurations such as the RAN DU profile or diverse workloads can require more resources from the hub cluster.
- Number of managed policies
- The number of policies managed by the hub cluster scaled over the number of managed clusters bound to those policies is an important factor that determines how many clusters can be managed.
- Monitoring and management workloads
- RHACM continuously monitors and manages the managed clusters. The number and complexity of monitoring and management workloads running on the hub cluster can affect its capacity. Intensive monitoring or frequent reconciliation operations can require additional resources, potentially limiting the number of manageable clusters.
- RHACM version and configuration
- Different versions of RHACM can have varying performance characteristics and resource requirements. Additionally, the configuration settings of RHACM, such as the number of concurrent reconciliations or the frequency of health checks, can affect the managed cluster capacity of the hub cluster.
Use the following representative configuration and network specifications to develop your own Hub cluster and network specifications.
The following guidelines are based on internal lab benchmark testing only and do not represent complete bare-metal host specifications.
Requirement | Description |
---|---|
OpenShift Container Platform | version 4.13 |
RHACM | version 2.7 |
Topology Aware Lifecycle Manager (TALM) | version 4.13 |
Server hardware | 3 x Dell PowerEdge R650 rack servers |
NVMe hard disks |
|
SSD hard disks |
|
Number of applied DU profile policies | 5 |
The following network specifications are representative of a typical real-world RAN network and were applied to the scale lab environment during testing.
Specification | Description |
---|---|
Round-trip time (RTT) latency | 50 ms |
Packet loss | 0.02% packet loss |
Network bandwidth limit | 20 Mbps |
Additional resources
2.3. Installing GitOps ZTP in a disconnected environment
Use Red Hat Advanced Cluster Management (RHACM), Red Hat OpenShift GitOps, and Topology Aware Lifecycle Manager (TALM) on the hub cluster in the disconnected environment to manage the deployment of multiple managed clusters.
Prerequisites
-
You have installed the OpenShift Container Platform CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. You have configured a disconnected mirror registry for use in the cluster.
NoteThe disconnected mirror registry that you create must contain a version of TALM backup and pre-cache images that matches the version of TALM running in the hub cluster. The spoke cluster must be able to resolve these images in the disconnected mirror registry.
Procedure
- Install RHACM in the hub cluster. See Installing RHACM in a disconnected environment.
- Install GitOps and TALM in the hub cluster.
Additional resources
2.4. Adding RHCOS ISO and RootFS images to the disconnected mirror host
Before you begin installing clusters in the disconnected environment with Red Hat Advanced Cluster Management (RHACM), you must first host Red Hat Enterprise Linux CoreOS (RHCOS) images for it to use. Use a disconnected mirror to host the RHCOS images.
Prerequisites
- Deploy and configure an HTTP server to host the RHCOS image resources on the network. You must be able to access the HTTP server from your computer, and from the machines that you create.
The RHCOS images might not change with every release of OpenShift Container Platform. You must download images with the highest version that is less than or equal to the version that you install. Use the image versions that match your OpenShift Container Platform version if they are available. You require ISO and RootFS images to install RHCOS on the hosts. RHCOS QCOW2 images are not supported for this installation type.
Procedure
- Log in to the mirror host.
Obtain the RHCOS ISO and RootFS images from mirror.openshift.com, for example:
Export the required image names and OpenShift Container Platform version as environment variables:
$ export ISO_IMAGE_NAME=<iso_image_name> 1
$ export ROOTFS_IMAGE_NAME=<rootfs_image_name> 1
$ export OCP_VERSION=<ocp_version> 1
Download the required images:
$ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.17/${OCP_VERSION}/${ISO_IMAGE_NAME} -O /var/www/html/${ISO_IMAGE_NAME}
$ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.17/${OCP_VERSION}/${ROOTFS_IMAGE_NAME} -O /var/www/html/${ROOTFS_IMAGE_NAME}
Verification steps
Verify that the images downloaded successfully and are being served on the disconnected mirror host, for example:
$ wget http://$(hostname)/${ISO_IMAGE_NAME}
Example output
Saving to: rhcos-4.17.1-x86_64-live.x86_64.iso rhcos-4.17.1-x86_64-live.x86_64.iso- 11%[====> ] 10.01M 4.71MB/s
Additional resources
2.5. Enabling the assisted service
Red Hat Advanced Cluster Management (RHACM) uses the assisted service to deploy OpenShift Container Platform clusters. The assisted service is deployed automatically when you enable the MultiClusterHub Operator on Red Hat Advanced Cluster Management (RHACM). After that, you need to configure the Provisioning
resource to watch all namespaces and to update the AgentServiceConfig
custom resource (CR) with references to the ISO and RootFS images that are hosted on the mirror registry HTTP server.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have RHACM with MultiClusterHub enabled.
Procedure
-
Enable the
Provisioning
resource to watch all namespaces and configure mirrors for disconnected environments. For more information, see Enabling the central infrastructure management service. Update the
AgentServiceConfig
CR by running the following command:$ oc edit AgentServiceConfig
Add the following entry to the
items.spec.osImages
field in the CR:- cpuArchitecture: x86_64 openshiftVersion: "4.17" rootFSUrl: https://<host>/<path>/rhcos-live-rootfs.x86_64.img url: https://<mirror-registry>/<path>/rhcos-live.x86_64.iso
where:
- <host>
- Is the fully qualified domain name (FQDN) for the target mirror registry HTTP server.
- <path>
- Is the path to the image on the target mirror registry.
Save and quit the editor to apply the changes.
2.6. Configuring the hub cluster to use a disconnected mirror registry
You can configure the hub cluster to use a disconnected mirror registry for a disconnected environment.
Prerequisites
- You have a disconnected hub cluster installation with Red Hat Advanced Cluster Management (RHACM) 2.11 installed.
-
You have hosted the
rootfs
andiso
images on an HTTP server. See the Additional resources section for guidance about Mirroring the OpenShift Container Platform image repository.
If you enable TLS for the HTTP server, you must confirm the root certificate is signed by an authority trusted by the client and verify the trusted certificate chain between your OpenShift Container Platform hub and managed clusters and the HTTP server. Using a server configured with an untrusted certificate prevents the images from being downloaded to the image creation service. Using untrusted HTTPS servers is not supported.
Procedure
Create a
ConfigMap
containing the mirror registry config:apiVersion: v1 kind: ConfigMap metadata: name: assisted-installer-mirror-config namespace: multicluster-engine 1 labels: app: assisted-service data: ca-bundle.crt: | 2 -----BEGIN CERTIFICATE----- <certificate_contents> -----END CERTIFICATE----- registries.conf: | 3 unqualified-search-registries = ["registry.access.redhat.com", "docker.io"] [[registry]] prefix = "" location = "quay.io/example-repository" 4 mirror-by-digest-only = true [[registry.mirror]] location = "mirror1.registry.corp.com:5000/example-repository" 5
- 1
- The
ConfigMap
namespace must be set tomulticluster-engine
. - 2
- The mirror registry’s certificate that is used when creating the mirror registry.
- 3
- The configuration file for the mirror registry. The mirror registry configuration adds mirror information to the
/etc/containers/registries.conf
file in the discovery image. The mirror information is stored in theimageContentSources
section of theinstall-config.yaml
file when the information is passed to the installation program. The Assisted Service pod that runs on the hub cluster fetches the container images from the configured mirror registry. - 4
- The URL of the mirror registry. You must use the URL from the
imageContentSources
section by running theoc adm release mirror
command when you configure the mirror registry. For more information, see the Mirroring the OpenShift Container Platform image repository section. - 5
- The registries defined in the
registries.conf
file must be scoped by repository, not by registry. In this example, both thequay.io/example-repository
and themirror1.registry.corp.com:5000/example-repository
repositories are scoped by theexample-repository
repository.
This updates
mirrorRegistryRef
in theAgentServiceConfig
custom resource, as shown below:Example output
apiVersion: agent-install.openshift.io/v1beta1 kind: AgentServiceConfig metadata: name: agent namespace: multicluster-engine 1 spec: databaseStorage: volumeName: <db_pv_name> accessModes: - ReadWriteOnce resources: requests: storage: <db_storage_size> filesystemStorage: volumeName: <fs_pv_name> accessModes: - ReadWriteOnce resources: requests: storage: <fs_storage_size> mirrorRegistryRef: name: assisted-installer-mirror-config 2 osImages: - openshiftVersion: <ocp_version> url: <iso_url> 3
A valid NTP server is required during cluster installation. Ensure that a suitable NTP server is available and can be reached from the installed clusters through the disconnected network.
Additional resources
2.7. Configuring the hub cluster to use unauthenticated registries
You can configure the hub cluster to use unauthenticated registries. Unauthenticated registries does not require authentication to access and download images.
Prerequisites
- You have installed and configured a hub cluster and installed Red Hat Advanced Cluster Management (RHACM) on the hub cluster.
- You have installed the OpenShift Container Platform CLI (oc).
-
You have logged in as a user with
cluster-admin
privileges. - You have configured an unauthenticated registry for use with the hub cluster.
Procedure
Update the
AgentServiceConfig
custom resource (CR) by running the following command:$ oc edit AgentServiceConfig agent
Add the
unauthenticatedRegistries
field in the CR:apiVersion: agent-install.openshift.io/v1beta1 kind: AgentServiceConfig metadata: name: agent spec: unauthenticatedRegistries: - example.registry.com - example.registry2.com ...
Unauthenticated registries are listed under
spec.unauthenticatedRegistries
in theAgentServiceConfig
resource. Any registry on this list is not required to have an entry in the pull secret used for the spoke cluster installation.assisted-service
validates the pull secret by making sure it contains the authentication information for every image registry used for installation.
Mirror registries are automatically added to the ignore list and do not need to be added under spec.unauthenticatedRegistries
. Specifying the PUBLIC_CONTAINER_REGISTRIES
environment variable in the ConfigMap
overrides the default values with the specified value. The PUBLIC_CONTAINER_REGISTRIES
defaults are quay.io and registry.svc.ci.openshift.org.
Verification
Verify that you can access the newly added registry from the hub cluster by running the following commands:
Open a debug shell prompt to the hub cluster:
$ oc debug node/<node_name>
Test access to the unauthenticated registry by running the following command:
sh-4.4# podman login -u kubeadmin -p $(oc whoami -t) <unauthenticated_registry>
where:
- <unauthenticated_registry>
-
Is the new registry, for example,
unauthenticated-image-registry.openshift-image-registry.svc:5000
.
Example output
Login Succeeded!
2.8. Configuring the hub cluster with ArgoCD
You can configure the hub cluster with a set of ArgoCD applications that generate the required installation and policy custom resources (CRs) for each site with GitOps Zero Touch Provisioning (ZTP).
Red Hat Advanced Cluster Management (RHACM) uses SiteConfig
CRs to generate the Day 1 managed cluster installation CRs for ArgoCD. Each ArgoCD application can manage a maximum of 300 SiteConfig
CRs.
Prerequisites
- You have a OpenShift Container Platform hub cluster with Red Hat Advanced Cluster Management (RHACM) and Red Hat OpenShift GitOps installed.
-
You have extracted the reference deployment from the GitOps ZTP plugin container as described in the "Preparing the GitOps ZTP site configuration repository" section. Extracting the reference deployment creates the
out/argocd/deployment
directory referenced in the following procedure.
Procedure
Prepare the ArgoCD pipeline configuration:
- Create a Git repository with the directory structure similar to the example directory. For more information, see "Preparing the GitOps ZTP site configuration repository".
Configure access to the repository using the ArgoCD UI. Under Settings configure the following:
-
Repositories - Add the connection information. The URL must end in
.git
, for example,https://repo.example.com/repo.git
and credentials. - Certificates - Add the public certificate for the repository, if needed.
-
Repositories - Add the connection information. The URL must end in
Modify the two ArgoCD applications,
out/argocd/deployment/clusters-app.yaml
andout/argocd/deployment/policies-app.yaml
, based on your Git repository:-
Update the URL to point to the Git repository. The URL ends with
.git
, for example,https://repo.example.com/repo.git
. -
The
targetRevision
indicates which Git repository branch to monitor. -
path
specifies the path to theSiteConfig
andPolicyGenerator
orPolicyGentemplate
CRs, respectively.
-
Update the URL to point to the Git repository. The URL ends with
To install the GitOps ZTP plugin, patch the ArgoCD instance in the hub cluster with the relevant multicluster engine (MCE) subscription image. Customize the patch file that you previously extracted into the
out/argocd/deployment/
directory for your environment.Select the
multicluster-operators-subscription
image that matches your RHACM version.Table 2.5. multicluster-operators-subscription image versions OpenShift Container Platform version RHACM version MCE version MCE RHEL version MCE image 4.14, 4.15, 4.16
2.8, 2.9
2.8, 2.9
RHEL 8
registry.redhat.io/rhacm2/multicluster-operators-subscription-rhel8:v2.8
registry.redhat.io/rhacm2/multicluster-operators-subscription-rhel8:v2.9
4.14, 4.15, 4.16
2.10
2.10
RHEL 9
registry.redhat.io/rhacm2/multicluster-operators-subscription-rhel9:v2.10
ImportantThe version of the
multicluster-operators-subscription
image should match the RHACM version. Beginning with the MCE 2.10 release, RHEL 9 is the base image formulticluster-operators-subscription
images.Add the following configuration to the
out/argocd/deployment/argocd-openshift-gitops-patch.json
file:{ "args": [ "-c", "mkdir -p /.config/kustomize/plugin/policy.open-cluster-management.io/v1/policygenerator && cp /policy-generator/PolicyGenerator-not-fips-compliant /.config/kustomize/plugin/policy.open-cluster-management.io/v1/policygenerator/PolicyGenerator" 1 ], "command": [ "/bin/bash" ], "image": "registry.redhat.io/rhacm2/multicluster-operators-subscription-rhel9:v2.10", 2 3 "name": "policy-generator-install", "imagePullPolicy": "Always", "volumeMounts": [ { "mountPath": "/.config", "name": "kustomize" } ] }
- 1
- Optional: For RHEL 9 images, copy the required universal executable in the
/policy-generator/PolicyGenerator-not-fips-compliant
folder for the ArgoCD version. - 2
- Match the
multicluster-operators-subscription
image to the RHACM version. - 3
- In disconnected environments, replace the URL for the
multicluster-operators-subscription
image with the disconnected registry equivalent for your environment.
Patch the ArgoCD instance. Run the following command:
$ oc patch argocd openshift-gitops \ -n openshift-gitops --type=merge \ --patch-file out/argocd/deployment/argocd-openshift-gitops-patch.json
In RHACM 2.7 and later, the multicluster engine enables the
cluster-proxy-addon
feature by default. Apply the following patch to disable thecluster-proxy-addon
feature and remove the relevant hub cluster and managed pods that are responsible for this add-on. Run the following command:$ oc patch multiclusterengines.multicluster.openshift.io multiclusterengine --type=merge --patch-file out/argocd/deployment/disable-cluster-proxy-addon.json
Apply the pipeline configuration to your hub cluster by running the following command:
$ oc apply -k out/argocd/deployment
2.9. Preparing the GitOps ZTP site configuration repository
Before you can use the GitOps Zero Touch Provisioning (ZTP) pipeline, you need to prepare the Git repository to host the site configuration data.
Prerequisites
- You have configured the hub cluster GitOps applications for generating the required installation and policy custom resources (CRs).
- You have deployed the managed clusters using GitOps ZTP.
Procedure
Create a directory structure with separate paths for the
SiteConfig
andPolicyGenerator
orPolicyGentemplate
CRs.NoteKeep
SiteConfig
andPolicyGenerator
orPolicyGentemplate
CRs in separate directories. Both theSiteConfig
andPolicyGenerator
orPolicyGentemplate
directories must contain akustomization.yaml
file that explicitly includes the files in that directory.Export the
argocd
directory from theztp-site-generate
container image using the following commands:$ podman pull registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17
$ mkdir -p ./out
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17 extract /home/ztp --tar | tar x -C ./out
Check that the
out
directory contains the following subdirectories:-
out/extra-manifest
contains the source CR files thatSiteConfig
uses to generate extra manifestconfigMap
. -
out/source-crs
contains the source CR files thatPolicyGenerator
uses to generate the Red Hat Advanced Cluster Management (RHACM) policies. -
out/argocd/deployment
contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure. -
out/argocd/example
contains the examples forSiteConfig
andPolicyGenerator
orPolicyGentemplate
files that represent the recommended configuration.
-
-
Copy the
out/source-crs
folder and contents to thePolicyGenerator
orPolicyGentemplate
directory. The out/extra-manifests directory contains the reference manifests for a RAN DU cluster. Copy the
out/extra-manifests
directory into theSiteConfig
folder. This directory should contain CRs from theztp-site-generate
container only. Do not add user-provided CRs here. If you want to work with user-provided CRs you must create another directory for that content. For example:example/ ├── acmpolicygenerator │ ├── kustomization.yaml │ └── source-crs/ ├── policygentemplates 1 │ ├── kustomization.yaml │ └── source-crs/ └── siteconfig ├── extra-manifests └── kustomization.yaml
- 1
- Using
PolicyGenTemplate
CRs to manage and deploy polices to manage clusters will be deprecated in a future OpenShift Container Platform release. Equivalent and improved functionality is available by using Red Hat Advanced Cluster Management (RHACM) andPolicyGenerator
CRs.
-
Commit the directory structure and the
kustomization.yaml
files and push to your Git repository. The initial push to Git should include thekustomization.yaml
files.
You can use the directory structure under out/argocd/example
as a reference for the structure and content of your Git repository. That structure includes SiteConfig
and PolicyGenerator
or PolicyGentemplate
reference CRs for single-node, three-node, and standard clusters. Remove references to cluster types that you are not using.
For all cluster types, you must:
-
Add the
source-crs
subdirectory to theacmpolicygenerator
orpolicygentemplates
directory. -
Add the
extra-manifests
directory to thesiteconfig
directory.
The following example describes a set of CRs for a network of single-node clusters:
example/ ├── acmpolicygenerator │ ├── acm-common-ranGen.yaml │ ├── acm-example-sno-site.yaml │ ├── acm-group-du-sno-ranGen.yaml │ ├── group-du-sno-validator-ranGen.yaml │ ├── kustomization.yaml │ ├── source-crs/ │ └── ns.yaml └── siteconfig ├── example-sno.yaml ├── extra-manifests/ 1 ├── custom-manifests/ 2 ├── KlusterletAddonConfigOverride.yaml └── kustomization.yaml
Using PolicyGenTemplate
CRs to manage and deploy polices to managed clusters will be deprecated in an upcoming OpenShift Container Platform release. Equivalent and improved functionality is available using Red Hat Advanced Cluster Management (RHACM) and PolicyGenerator
CRs.
For more information about PolicyGenerator
resources, see the RHACM Policy Generator documentation.
2.10. Preparing the GitOps ZTP site configuration repository for version independence
You can use GitOps ZTP to manage source custom resources (CRs) for managed clusters that are running different versions of OpenShift Container Platform. This means that the version of OpenShift Container Platform running on the hub cluster can be independent of the version running on the managed clusters.
The following procedure assumes you are using PolicyGenerator
resources instead of PolicyGentemplate
resources for cluster policies management.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges.
Procedure
-
Create a directory structure with separate paths for the
SiteConfig
andPolicyGenerator
CRs. Within the
PolicyGenerator
directory, create a directory for each OpenShift Container Platform version you want to make available. For each version, create the following resources:-
kustomization.yaml
file that explicitly includes the files in that directory source-crs
directory to contain reference CR configuration files from theztp-site-generate
containerIf you want to work with user-provided CRs, you must create a separate directory for them.
-
In the
/siteconfig
directory, create a subdirectory for each OpenShift Container Platform version you want to make available. For each version, create at least one directory for reference CRs to be copied from the container. There is no restriction on the naming of directories or on the number of reference directories. If you want to work with custom manifests, you must create a separate directory for them.The following example describes a structure using user-provided manifests and CRs for different versions of OpenShift Container Platform:
├── acmpolicygenerator │ ├── kustomization.yaml 1 │ ├── version_4.13 2 │ │ ├── common-ranGen.yaml │ │ ├── group-du-sno-ranGen.yaml │ │ ├── group-du-sno-validator-ranGen.yaml │ │ ├── helix56-v413.yaml │ │ ├── kustomization.yaml 3 │ │ ├── ns.yaml │ │ └── source-crs/ 4 │ │ └── reference-crs/ 5 │ │ └── custom-crs/ 6 │ └── version_4.14 7 │ ├── common-ranGen.yaml │ ├── group-du-sno-ranGen.yaml │ ├── group-du-sno-validator-ranGen.yaml │ ├── helix56-v414.yaml │ ├── kustomization.yaml 8 │ ├── ns.yaml │ └── source-crs/ 9 │ └── reference-crs/ 10 │ └── custom-crs/ 11 └── siteconfig ├── kustomization.yaml ├── version_4.13 │ ├── helix56-v413.yaml │ ├── kustomization.yaml │ ├── extra-manifest/ 12 │ └── custom-manifest/ 13 └── version_4.14 ├── helix57-v414.yaml ├── kustomization.yaml ├── extra-manifest/ 14 └── custom-manifest/ 15
- 1
- Create a top-level
kustomization
YAML file. - 2 7
- Create the version-specific directories within the custom
/acmpolicygenerator
directory. - 3 8
- Create a
kustomization.yaml
file for each version. - 4 9
- Create a
source-crs
directory for each version to contain reference CRs from theztp-site-generate
container. - 5 10
- Create the
reference-crs
directory for policy CRs that are extracted from the ZTP container. - 6 11
- Optional: Create a
custom-crs
directory for user-provided CRs. - 12 14
- Create a directory within the custom
/siteconfig
directory to contain extra manifests from theztp-site-generate
container. - 13 15
- Create a folder to hold user-provided manifests.
NoteIn the previous example, each version subdirectory in the custom
/siteconfig
directory contains two further subdirectories, one containing the reference manifests copied from the container, the other for custom manifests that you provide. The names assigned to those directories are examples. If you use user-provided CRs, the last directory listed underextraManifests.searchPaths
in theSiteConfig
CR must be the directory containing user-provided CRs.Edit the
SiteConfig
CR to include the search paths of any directories you have created. The first directory that is listed underextraManifests.searchPaths
must be the directory containing the reference manifests. Consider the order in which the directories are listed. In cases where directories contain files with the same name, the file in the final directory takes precedence.Example SiteConfig CR
extraManifests: searchPaths: - extra-manifest/ 1 - custom-manifest/ 2
Edit the top-level
kustomization.yaml
file to control which OpenShift Container Platform versions are active. The following is an example of akustomization.yaml
file at the top level:resources: - version_4.13 1 #- version_4.14 2
Chapter 3. Updating GitOps ZTP
You can update the GitOps Zero Touch Provisioning (ZTP) infrastructure independently from the hub cluster, Red Hat Advanced Cluster Management (RHACM), and the managed OpenShift Container Platform clusters.
You can update the Red Hat OpenShift GitOps Operator when new versions become available. When updating the GitOps ZTP plugin, review the updated files in the reference configuration and ensure that the changes meet your requirements.
Using PolicyGenTemplate
CRs to manage and deploy polices to managed clusters will be deprecated in an upcoming OpenShift Container Platform release. Equivalent and improved functionality is available using Red Hat Advanced Cluster Management (RHACM) and PolicyGenerator
CRs.
For more information about PolicyGenerator
resources, see the RHACM Policy Generator documentation.
Additional resources
3.1. Overview of the GitOps ZTP update process
You can update GitOps Zero Touch Provisioning (ZTP) for a fully operational hub cluster running an earlier version of the GitOps ZTP infrastructure. The update process avoids impact on managed clusters.
Any changes to policy settings, including adding recommended content, results in updated polices that must be rolled out to the managed clusters and reconciled.
At a high level, the strategy for updating the GitOps ZTP infrastructure is as follows:
-
Label all existing clusters with the
ztp-done
label. - Stop the ArgoCD applications.
- Install the new GitOps ZTP tools.
- Update required content and optional changes in the Git repository.
- Update and restart the application configuration.
3.2. Preparing for the upgrade
Use the following procedure to prepare your site for the GitOps Zero Touch Provisioning (ZTP) upgrade.
Procedure
- Get the latest version of the GitOps ZTP container that has the custom resources (CRs) used to configure Red Hat OpenShift GitOps for use with GitOps ZTP.
Extract the
argocd/deployment
directory by using the following commands:$ mkdir -p ./update
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17 extract /home/ztp --tar | tar x -C ./update
The
/update
directory contains the following subdirectories:-
update/extra-manifest
: contains the source CR files that theSiteConfig
CR uses to generate the extra manifestconfigMap
. -
update/source-crs
: contains the source CR files that thePolicyGenerator
orPolicyGentemplate
CR uses to generate the Red Hat Advanced Cluster Management (RHACM) policies. -
update/argocd/deployment
: contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure. -
update/argocd/example
: contains exampleSiteConfig
andPolicyGenerator
orPolicyGentemplate
files that represent the recommended configuration.
-
Update the
clusters-app.yaml
andpolicies-app.yaml
files to reflect the name of your applications and the URL, branch, and path for your Git repository.If the upgrade includes changes that results in obsolete policies, the obsolete policies should be removed prior to performing the upgrade.
Diff the changes between the configuration and deployment source CRs in the
/update
folder and Git repo where you manage your fleet site CRs. Apply and push the required changes to your site repository.ImportantWhen you update GitOps ZTP to the latest version, you must apply the changes from the
update/argocd/deployment
directory to your site repository. Do not use older versions of theargocd/deployment/
files.
3.3. Labeling the existing clusters
To ensure that existing clusters remain untouched by the tool updates, label all existing managed clusters with the ztp-done
label.
This procedure only applies when updating clusters that were not provisioned with Topology Aware Lifecycle Manager (TALM). Clusters that you provision with TALM are automatically labeled with ztp-done
.
Procedure
Find a label selector that lists the managed clusters that were deployed with GitOps Zero Touch Provisioning (ZTP), such as
local-cluster!=true
:$ oc get managedcluster -l 'local-cluster!=true'
Ensure that the resulting list contains all the managed clusters that were deployed with GitOps ZTP, and then use that selector to add the
ztp-done
label:$ oc label managedcluster -l 'local-cluster!=true' ztp-done=
3.4. Stopping the existing GitOps ZTP applications
Removing the existing applications ensures that any changes to existing content in the Git repository are not rolled out until the new version of the tools is available.
Use the application files from the deployment
directory. If you used custom names for the applications, update the names in these files first.
Procedure
Perform a non-cascaded delete on the
clusters
application to leave all generated resources in place:$ oc delete -f update/argocd/deployment/clusters-app.yaml
Perform a cascaded delete on the
policies
application to remove all previous policies:$ oc patch -f policies-app.yaml -p '{"metadata": {"finalizers": ["resources-finalizer.argocd.argoproj.io"]}}' --type merge
$ oc delete -f update/argocd/deployment/policies-app.yaml
3.5. Required changes to the Git repository
When upgrading the ztp-site-generate
container from an earlier release of GitOps Zero Touch Provisioning (ZTP) to 4.10 or later, there are additional requirements for the contents of the Git repository. Existing content in the repository must be updated to reflect these changes.
The following procedure assumes you are using PolicyGenerator
resources instead of PolicyGentemplate
resources for cluster policies management.
Make required changes to
PolicyGenerator
files:All
PolicyGenerator
files must be created in aNamespace
prefixed withztp
. This ensures that the GitOps ZTP application is able to manage the policy CRs generated by GitOps ZTP without conflicting with the way Red Hat Advanced Cluster Management (RHACM) manages the policies internally.Add the
kustomization.yaml
file to the repository:All
SiteConfig
andPolicyGenerator
CRs must be included in akustomization.yaml
file under their respective directory trees. For example:├── acmpolicygenerator │ ├── site1-ns.yaml │ ├── site1.yaml │ ├── site2-ns.yaml │ ├── site2.yaml │ ├── common-ns.yaml │ ├── common-ranGen.yaml │ ├── group-du-sno-ranGen-ns.yaml │ ├── group-du-sno-ranGen.yaml │ └── kustomization.yaml └── siteconfig ├── site1.yaml ├── site2.yaml └── kustomization.yaml
NoteThe files listed in the
generator
sections must contain eitherSiteConfig
or{policy-gen-cr}
CRs only. If your existing YAML files contain other CRs, for example,Namespace
, these other CRs must be pulled out into separate files and listed in theresources
section.The
PolicyGenerator
kustomization file must contain allPolicyGenerator
YAML files in thegenerator
section andNamespace
CRs in theresources
section. For example:apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization generators: - acm-common-ranGen.yaml - acm-group-du-sno-ranGen.yaml - site1.yaml - site2.yaml resources: - common-ns.yaml - acm-group-du-sno-ranGen-ns.yaml - site1-ns.yaml - site2-ns.yaml
The
SiteConfig
kustomization file must contain allSiteConfig
YAML files in thegenerator
section and any other CRs in the resources:apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization generators: - site1.yaml - site2.yaml
Remove the
pre-sync.yaml
andpost-sync.yaml
files.In OpenShift Container Platform 4.10 and later, the
pre-sync.yaml
andpost-sync.yaml
files are no longer required. Theupdate/deployment/kustomization.yaml
CR manages the policies deployment on the hub cluster.NoteThere is a set of
pre-sync.yaml
andpost-sync.yaml
files under both theSiteConfig
and{policy-gen-cr}
trees.Review and incorporate recommended changes
Each release may include additional recommended changes to the configuration applied to deployed clusters. Typically these changes result in lower CPU use by the OpenShift platform, additional features, or improved tuning of the platform.
Review the reference
SiteConfig
andPolicyGenerator
CRs applicable to the types of cluster in your network. These examples can be found in theargocd/example
directory extracted from the GitOps ZTP container.
3.6. Installing the new GitOps ZTP applications
Using the extracted argocd/deployment
directory, and after ensuring that the applications point to your site Git repository, apply the full contents of the deployment directory. Applying the full contents of the directory ensures that all necessary resources for the applications are correctly configured.
Procedure
To install the GitOps ZTP plugin, patch the ArgoCD instance in the hub cluster with the relevant multicluster engine (MCE) subscription image. Customize the patch file that you previously extracted into the
out/argocd/deployment/
directory for your environment.Select the
multicluster-operators-subscription
image that matches your RHACM version.Table 3.1. multicluster-operators-subscription image versions OpenShift Container Platform version RHACM version MCE version MCE RHEL version MCE image 4.14, 4.15, 4.16
2.8, 2.9
2.8, 2.9
RHEL 8
registry.redhat.io/rhacm2/multicluster-operators-subscription-rhel8:v2.8
registry.redhat.io/rhacm2/multicluster-operators-subscription-rhel8:v2.9
4.14, 4.15, 4.16
2.10
2.10
RHEL 9
registry.redhat.io/rhacm2/multicluster-operators-subscription-rhel9:v2.10
ImportantThe version of the
multicluster-operators-subscription
image should match the RHACM version. Beginning with the MCE 2.10 release, RHEL 9 is the base image formulticluster-operators-subscription
images.Add the following configuration to the
out/argocd/deployment/argocd-openshift-gitops-patch.json
file:{ "args": [ "-c", "mkdir -p /.config/kustomize/plugin/policy.open-cluster-management.io/v1/policygenerator && cp /policy-generator/PolicyGenerator-not-fips-compliant /.config/kustomize/plugin/policy.open-cluster-management.io/v1/policygenerator/PolicyGenerator" 1 ], "command": [ "/bin/bash" ], "image": "registry.redhat.io/rhacm2/multicluster-operators-subscription-rhel9:v2.10", 2 3 "name": "policy-generator-install", "imagePullPolicy": "Always", "volumeMounts": [ { "mountPath": "/.config", "name": "kustomize" } ] }
- 1
- Optional: For RHEL 9 images, copy the required universal executable in the
/policy-generator/PolicyGenerator-not-fips-compliant
folder for the ArgoCD version. - 2
- Match the
multicluster-operators-subscription
image to the RHACM version. - 3
- In disconnected environments, replace the URL for the
multicluster-operators-subscription
image with the disconnected registry equivalent for your environment.
Patch the ArgoCD instance. Run the following command:
$ oc patch argocd openshift-gitops \ -n openshift-gitops --type=merge \ --patch-file out/argocd/deployment/argocd-openshift-gitops-patch.json
In RHACM 2.7 and later, the multicluster engine enables the
cluster-proxy-addon
feature by default. Apply the following patch to disable thecluster-proxy-addon
feature and remove the relevant hub cluster and managed pods that are responsible for this add-on. Run the following command:$ oc patch multiclusterengines.multicluster.openshift.io multiclusterengine --type=merge --patch-file out/argocd/deployment/disable-cluster-proxy-addon.json
Apply the pipeline configuration to your hub cluster by running the following command:
$ oc apply -k out/argocd/deployment
3.7. Rolling out the GitOps ZTP configuration changes
If any configuration changes were included in the upgrade due to implementing recommended changes, the upgrade process results in a set of policy CRs on the hub cluster in the Non-Compliant
state. With the GitOps Zero Touch Provisioning (ZTP) version 4.10 and later ztp-site-generate
container, these policies are set to inform
mode and are not pushed to the managed clusters without an additional step by the user. This ensures that potentially disruptive changes to the clusters can be managed in terms of when the changes are made, for example, during a maintenance window, and how many clusters are updated concurrently.
To roll out the changes, create one or more ClusterGroupUpgrade
CRs as detailed in the TALM documentation. The CR must contain the list of Non-Compliant
policies that you want to push out to the managed clusters as well as a list or selector of which clusters should be included in the update.
Additional resources
- For information about the Topology Aware Lifecycle Manager (TALM), see About the Topology Aware Lifecycle Manager configuration.
-
For information about creating
ClusterGroupUpgrade
CRs, see About the auto-created ClusterGroupUpgrade CR for GitOps ZTP.
Chapter 4. Installing managed clusters with RHACM and SiteConfig resources
You can provision OpenShift Container Platform clusters at scale with Red Hat Advanced Cluster Management (RHACM) using the assisted service and the GitOps plugin policy generator with core-reduction technology enabled. The GitOps Zero Touch Provisioning (ZTP) pipeline performs the cluster installations. GitOps ZTP can be used in a disconnected environment.
Using PolicyGenTemplate
CRs to manage and deploy polices to managed clusters will be deprecated in an upcoming OpenShift Container Platform release. Equivalent and improved functionality is available using Red Hat Advanced Cluster Management (RHACM) and PolicyGenerator
CRs.
For more information about PolicyGenerator
resources, see the RHACM Policy Generator documentation.
Additional resources
4.1. GitOps ZTP and Topology Aware Lifecycle Manager
GitOps Zero Touch Provisioning (ZTP) generates installation and configuration CRs from manifests stored in Git. These artifacts are applied to a centralized hub cluster where Red Hat Advanced Cluster Management (RHACM), the assisted service, and the Topology Aware Lifecycle Manager (TALM) use the CRs to install and configure the managed cluster. The configuration phase of the GitOps ZTP pipeline uses the TALM to orchestrate the application of the configuration CRs to the cluster. There are several key integration points between GitOps ZTP and the TALM.
- Inform policies
-
By default, GitOps ZTP creates all policies with a remediation action of
inform
. These policies cause RHACM to report on compliance status of clusters relevant to the policies but does not apply the desired configuration. During the GitOps ZTP process, after OpenShift installation, the TALM steps through the createdinform
policies and enforces them on the target managed cluster(s). This applies the configuration to the managed cluster. Outside of the GitOps ZTP phase of the cluster lifecycle, this allows you to change policies without the risk of immediately rolling those changes out to affected managed clusters. You can control the timing and the set of remediated clusters by using TALM. - Automatic creation of ClusterGroupUpgrade CRs
To automate the initial configuration of newly deployed clusters, TALM monitors the state of all
ManagedCluster
CRs on the hub cluster. AnyManagedCluster
CR that does not have aztp-done
label applied, including newly createdManagedCluster
CRs, causes the TALM to automatically create aClusterGroupUpgrade
CR with the following characteristics:-
The
ClusterGroupUpgrade
CR is created and enabled in theztp-install
namespace. -
ClusterGroupUpgrade
CR has the same name as theManagedCluster
CR. -
The cluster selector includes only the cluster associated with that
ManagedCluster
CR. -
The set of managed policies includes all policies that RHACM has bound to the cluster at the time the
ClusterGroupUpgrade
is created. - Pre-caching is disabled.
- Timeout set to 4 hours (240 minutes).
The automatic creation of an enabled
ClusterGroupUpgrade
ensures that initial zero-touch deployment of clusters proceeds without the need for user intervention. Additionally, the automatic creation of aClusterGroupUpgrade
CR for anyManagedCluster
without theztp-done
label allows a failed GitOps ZTP installation to be restarted by simply deleting theClusterGroupUpgrade
CR for the cluster.-
The
- Waves
Each policy generated from a
PolicyGenerator
orPolicyGentemplate
CR includes aztp-deploy-wave
annotation. This annotation is based on the same annotation from each CR which is included in that policy. The wave annotation is used to order the policies in the auto-generatedClusterGroupUpgrade
CR. The wave annotation is not used other than for the auto-generatedClusterGroupUpgrade
CR.NoteAll CRs in the same policy must have the same setting for the
ztp-deploy-wave
annotation. The default value of this annotation for each CR can be overridden in thePolicyGenerator
orPolicyGentemplate
. The wave annotation in the source CR is used for determining and setting the policy wave annotation. This annotation is removed from each built CR which is included in the generated policy at runtime.The TALM applies the configuration policies in the order specified by the wave annotations. The TALM waits for each policy to be compliant before moving to the next policy. It is important to ensure that the wave annotation for each CR takes into account any prerequisites for those CRs to be applied to the cluster. For example, an Operator must be installed before or concurrently with the configuration for the Operator. Similarly, the
CatalogSource
for an Operator must be installed in a wave before or concurrently with the Operator Subscription. The default wave value for each CR takes these prerequisites into account.Multiple CRs and policies can share the same wave number. Having fewer policies can result in faster deployments and lower CPU usage. It is a best practice to group many CRs into relatively few waves.
To check the default wave value in each source CR, run the following command against the out/source-crs
directory that is extracted from the ztp-site-generate
container image:
$ grep -r "ztp-deploy-wave" out/source-crs
- Phase labels
The
ClusterGroupUpgrade
CR is automatically created and includes directives to annotate theManagedCluster
CR with labels at the start and end of the GitOps ZTP process.When GitOps ZTP configuration postinstallation commences, the
ManagedCluster
has theztp-running
label applied. When all policies are remediated to the cluster and are fully compliant, these directives cause the TALM to remove theztp-running
label and apply theztp-done
label.For deployments that make use of the
informDuValidator
policy, theztp-done
label is applied when the cluster is fully ready for deployment of applications. This includes all reconciliation and resulting effects of the GitOps ZTP applied configuration CRs. Theztp-done
label affects automaticClusterGroupUpgrade
CR creation by TALM. Do not manipulate this label after the initial GitOps ZTP installation of the cluster.- Linked CRs
-
The automatically created
ClusterGroupUpgrade
CR has the owner reference set as theManagedCluster
from which it was derived. This reference ensures that deleting theManagedCluster
CR causes the instance of theClusterGroupUpgrade
to be deleted along with any supporting resources.
4.2. Overview of deploying managed clusters with GitOps ZTP
Red Hat Advanced Cluster Management (RHACM) uses GitOps Zero Touch Provisioning (ZTP) to deploy single-node OpenShift Container Platform clusters, three-node clusters, and standard clusters. You manage site configuration data as OpenShift Container Platform custom resources (CRs) in a Git repository. GitOps ZTP uses a declarative GitOps approach for a develop once, deploy anywhere model to deploy the managed clusters.
The deployment of the clusters includes:
- Installing the host operating system (RHCOS) on a blank server
- Deploying OpenShift Container Platform
- Creating cluster policies and site subscriptions
- Making the necessary network configurations to the server operating system
- Deploying profile Operators and performing any needed software-related configuration, such as performance profile, PTP, and SR-IOV
Overview of the managed site installation process
After you apply the managed site custom resources (CRs) on the hub cluster, the following actions happen automatically:
- A Discovery image ISO file is generated and booted on the target host.
- When the ISO file successfully boots on the target host it reports the host hardware information to RHACM.
- After all hosts are discovered, OpenShift Container Platform is installed.
-
When OpenShift Container Platform finishes installing, the hub installs the
klusterlet
service on the target cluster. - The requested add-on services are installed on the target cluster.
The Discovery image ISO process is complete when the Agent
CR for the managed cluster is created on the hub cluster.
The target bare-metal host must meet the networking, firmware, and hardware requirements listed in Recommended single-node OpenShift cluster configuration for vDU application workloads.
4.3. Creating the managed bare-metal host secrets
Add the required Secret
custom resources (CRs) for the managed bare-metal host to the hub cluster. You need a secret for the GitOps Zero Touch Provisioning (ZTP) pipeline to access the Baseboard Management Controller (BMC) and a secret for the assisted installer service to pull cluster installation images from the registry.
The secrets are referenced from the SiteConfig
CR by name. The namespace must match the SiteConfig
namespace.
Procedure
Create a YAML secret file containing credentials for the host Baseboard Management Controller (BMC) and a pull secret required for installing OpenShift and all add-on cluster Operators:
Save the following YAML as the file
example-sno-secret.yaml
:apiVersion: v1 kind: Secret metadata: name: example-sno-bmc-secret namespace: example-sno 1 data: 2 password: <base64_password> username: <base64_username> type: Opaque --- apiVersion: v1 kind: Secret metadata: name: pull-secret namespace: example-sno 3 data: .dockerconfigjson: <pull_secret> 4 type: kubernetes.io/dockerconfigjson
-
Add the relative path to
example-sno-secret.yaml
to thekustomization.yaml
file that you use to install the cluster.
4.4. Configuring Discovery ISO kernel arguments for installations using GitOps ZTP
The GitOps Zero Touch Provisioning (ZTP) workflow uses the Discovery ISO as part of the OpenShift Container Platform installation process on managed bare-metal hosts. You can edit the InfraEnv
resource to specify kernel arguments for the Discovery ISO. This is useful for cluster installations with specific environmental requirements. For example, configure the rd.net.timeout.carrier
kernel argument for the Discovery ISO to facilitate static networking for the cluster or to receive a DHCP address before downloading the root file system during installation.
In OpenShift Container Platform 4.17, you can only add kernel arguments. You can not replace or delete kernel arguments.
Prerequisites
- You have installed the OpenShift CLI (oc).
- You have logged in to the hub cluster as a user with cluster-admin privileges.
Procedure
Create the
InfraEnv
CR and edit thespec.kernelArguments
specification to configure kernel arguments.Save the following YAML in an
InfraEnv-example.yaml
file:NoteThe
InfraEnv
CR in this example uses template syntax such as{{ .Cluster.ClusterName }}
that is populated based on values in theSiteConfig
CR. TheSiteConfig
CR automatically populates values for these templates during deployment. Do not edit the templates manually.apiVersion: agent-install.openshift.io/v1beta1 kind: InfraEnv metadata: annotations: argocd.argoproj.io/sync-wave: "1" name: "{{ .Cluster.ClusterName }}" namespace: "{{ .Cluster.ClusterName }}" spec: clusterRef: name: "{{ .Cluster.ClusterName }}" namespace: "{{ .Cluster.ClusterName }}" kernelArguments: - operation: append 1 value: audit=0 2 - operation: append value: trace=1 sshAuthorizedKey: "{{ .Site.SshPublicKey }}" proxy: "{{ .Cluster.ProxySettings }}" pullSecretRef: name: "{{ .Site.PullSecretRef.Name }}" ignitionConfigOverride: "{{ .Cluster.IgnitionConfigOverride }}" nmStateConfigLabelSelector: matchLabels: nmstate-label: "{{ .Cluster.ClusterName }}" additionalNTPSources: "{{ .Cluster.AdditionalNTPSources }}"
Commit the
InfraEnv-example.yaml
CR to the same location in your Git repository that has theSiteConfig
CR and push your changes. The following example shows a sample Git repository structure:~/example-ztp/install └── site-install ├── siteconfig-example.yaml ├── InfraEnv-example.yaml ...
Edit the
spec.clusters.crTemplates
specification in theSiteConfig
CR to reference theInfraEnv-example.yaml
CR in your Git repository:clusters: crTemplates: InfraEnv: "InfraEnv-example.yaml"
When you are ready to deploy your cluster by committing and pushing the
SiteConfig
CR, the build pipeline uses the customInfraEnv-example
CR in your Git repository to configure the infrastructure environment, including the custom kernel arguments.
Verification
To verify that the kernel arguments are applied, after the Discovery image verifies that OpenShift Container Platform is ready for installation, you can SSH to the target host before the installation process begins. At that point, you can view the kernel arguments for the Discovery ISO in the /proc/cmdline
file.
Begin an SSH session with the target host:
$ ssh -i /path/to/privatekey core@<host_name>
View the system’s kernel arguments by using the following command:
$ cat /proc/cmdline
4.5. Deploying a managed cluster with SiteConfig and GitOps ZTP
Use the following procedure to create a SiteConfig
custom resource (CR) and related files and initiate the GitOps Zero Touch Provisioning (ZTP) cluster deployment.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You configured the hub cluster for generating the required installation and policy CRs.
You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and you must configure it as a source repository for the ArgoCD application. See "Preparing the GitOps ZTP site configuration repository" for more information.
NoteWhen you create the source repository, ensure that you patch the ArgoCD application with the
argocd/deployment/argocd-openshift-gitops-patch.json
patch-file that you extract from theztp-site-generate
container. See "Configuring the hub cluster with ArgoCD".To be ready for provisioning managed clusters, you require the following for each bare-metal host:
- Network connectivity
- Your network requires DNS. Managed cluster hosts should be reachable from the hub cluster. Ensure that Layer 3 connectivity exists between the hub cluster and the managed cluster host.
- Baseboard Management Controller (BMC) details
-
GitOps ZTP uses BMC username and password details to connect to the BMC during cluster installation. The GitOps ZTP plugin manages the
ManagedCluster
CRs on the hub cluster based on theSiteConfig
CR in your site Git repo. You create individualBMCSecret
CRs for each host manually.
Procedure
Create the required managed cluster secrets on the hub cluster. These resources must be in a namespace with a name matching the cluster name. For example, in
out/argocd/example/siteconfig/example-sno.yaml
, the cluster name and namespace isexample-sno
.Export the cluster namespace by running the following command:
$ export CLUSTERNS=example-sno
Create the namespace:
$ oc create namespace $CLUSTERNS
Create pull secret and BMC
Secret
CRs for the managed cluster. The pull secret must contain all the credentials necessary for installing OpenShift Container Platform and all required Operators. See "Creating the managed bare-metal host secrets" for more information.NoteThe secrets are referenced from the
SiteConfig
custom resource (CR) by name. The namespace must match theSiteConfig
namespace.Create a
SiteConfig
CR for your cluster in your local clone of the Git repository:Choose the appropriate example for your CR from the
out/argocd/example/siteconfig/
folder. The folder includes example files for single node, three-node, and standard clusters:-
example-sno.yaml
-
example-3node.yaml
-
example-standard.yaml
-
Change the cluster and host details in the example file to match the type of cluster you want. For example:
Example single-node OpenShift SiteConfig CR
# example-node1-bmh-secret & assisted-deployment-pull-secret need to be created under same namespace example-sno --- apiVersion: ran.openshift.io/v1 kind: SiteConfig metadata: name: "example-sno" namespace: "example-sno" spec: baseDomain: "example.com" pullSecretRef: name: "assisted-deployment-pull-secret" clusterImageSetNameRef: "openshift-4.16" sshPublicKey: "ssh-rsa AAAA..." clusters: - clusterName: "example-sno" networkType: "OVNKubernetes" # installConfigOverrides is a generic way of passing install-config # parameters through the siteConfig. The 'capabilities' field configures # the composable openshift feature. In this 'capabilities' setting, we # remove all the optional set of components. # Notes: # - OperatorLifecycleManager is needed for 4.15 and later # - NodeTuning is needed for 4.13 and later, not for 4.12 and earlier # - Ingress is needed for 4.16 and later installConfigOverrides: | { "capabilities": { "baselineCapabilitySet": "None", "additionalEnabledCapabilities": [ "NodeTuning", "OperatorLifecycleManager", "Ingress" ] } } # It is strongly recommended to include crun manifests as part of the additional install-time manifests for 4.13+. # The crun manifests can be obtained from source-crs/optional-extra-manifest/ and added to the git repo ie.sno-extra-manifest. # extraManifestPath: sno-extra-manifest clusterLabels: # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples du-profile: "latest" # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples in ../policygentemplates: # ../policygentemplates/common-ranGen.yaml will apply to all clusters with 'common: true' common: true # ../policygentemplates/group-du-sno-ranGen.yaml will apply to all clusters with 'group-du-sno: ""' group-du-sno: "" # ../policygentemplates/example-sno-site.yaml will apply to all clusters with 'sites: "example-sno"' # Normally this should match or contain the cluster name so it only applies to a single cluster sites: "example-sno" clusterNetwork: - cidr: 1001:1::/48 hostPrefix: 64 machineNetwork: - cidr: 1111:2222:3333:4444::/64 serviceNetwork: - 1001:2::/112 additionalNTPSources: - 1111:2222:3333:4444::2 # Initiates the cluster for workload partitioning. Setting specific reserved/isolated CPUSets is done via PolicyTemplate # please see Workload Partitioning Feature for a complete guide. cpuPartitioningMode: AllNodes # Optionally; This can be used to override the KlusterletAddonConfig that is created for this cluster: #crTemplates: # KlusterletAddonConfig: "KlusterletAddonConfigOverride.yaml" nodes: - hostName: "example-node1.example.com" role: "master" # Optionally; This can be used to configure desired BIOS setting on a host: #biosConfigRef: # filePath: "example-hw.profile" bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1" bmcCredentialsName: name: "example-node1-bmh-secret" bootMACAddress: "AA:BB:CC:DD:EE:11" # Use UEFISecureBoot to enable secure boot. bootMode: "UEFISecureBoot" rootDeviceHints: deviceName: "/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0" # disk partition at `/var/lib/containers` with ignitionConfigOverride. Some values must be updated. See DiskPartitionContainer.md for more details ignitionConfigOverride: | { "ignition": { "version": "3.2.0" }, "storage": { "disks": [ { "device": "/dev/disk/by-id/wwn-0x6b07b250ebb9d0002a33509f24af1f62", "partitions": [ { "label": "var-lib-containers", "sizeMiB": 0, "startMiB": 250000 } ], "wipeTable": false } ], "filesystems": [ { "device": "/dev/disk/by-partlabel/var-lib-containers", "format": "xfs", "mountOptions": [ "defaults", "prjquota" ], "path": "/var/lib/containers", "wipeFilesystem": true } ] }, "systemd": { "units": [ { "contents": "# Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target", "enabled": true, "name": "var-lib-containers.mount" } ] } } nodeNetwork: interfaces: - name: eno1 macAddress: "AA:BB:CC:DD:EE:11" config: interfaces: - name: eno1 type: ethernet state: up ipv4: enabled: false ipv6: enabled: true address: # For SNO sites with static IP addresses, the node-specific, # API and Ingress IPs should all be the same and configured on # the interface - ip: 1111:2222:3333:4444::aaaa:1 prefix-length: 64 dns-resolver: config: search: - example.com server: - 1111:2222:3333:4444::2 routes: config: - destination: ::/0 next-hop-interface: eno1 next-hop-address: 1111:2222:3333:4444::1 table-id: 254
NoteFor more information about BMC addressing, see the "Additional resources" section. The
installConfigOverrides
andignitionConfigOverride
fields are expanded in the example for ease of readability.-
You can inspect the default set of extra-manifest
MachineConfig
CRs inout/argocd/extra-manifest
. It is automatically applied to the cluster when it is installed. Optional: To provision additional install-time manifests on the provisioned cluster, create a directory in your Git repository, for example,
sno-extra-manifest/
, and add your custom manifest CRs to this directory. If yourSiteConfig.yaml
refers to this directory in theextraManifestPath
field, any CRs in this referenced directory are appended to the default set of extra manifests.Enabling the crun OCI container runtimeFor optimal cluster performance, enable crun for master and worker nodes in single-node OpenShift, single-node OpenShift with additional worker nodes, three-node OpenShift, and standard clusters.
Enable crun in a
ContainerRuntimeConfig
CR as an additional Day 0 install-time manifest to avoid the cluster having to reboot.The
enable-crun-master.yaml
andenable-crun-worker.yaml
CR files are in theout/source-crs/optional-extra-manifest/
folder that you can extract from theztp-site-generate
container. For more information, see "Customizing extra installation manifests in the GitOps ZTP pipeline".
-
Add the
SiteConfig
CR to thekustomization.yaml
file in thegenerators
section, similar to the example shown inout/argocd/example/siteconfig/kustomization.yaml
. Commit the
SiteConfig
CR and associatedkustomization.yaml
changes in your Git repository and push the changes.The ArgoCD pipeline detects the changes and begins the managed cluster deployment.
Verification
Verify that the custom roles and labels are applied after the node is deployed:
$ oc describe node example-node.example.com
Example output
Name: example-node.example.com
Roles: control-plane,example-label,master,worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
custom-label/parameter1=true
kubernetes.io/arch=amd64
kubernetes.io/hostname=cnfdf03.telco5gran.eng.rdu2.redhat.com
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node-role.kubernetes.io/example-label= 1
node-role.kubernetes.io/master=
node-role.kubernetes.io/worker=
node.openshift.io/os_id=rhcos
- 1
- The custom label is applied to the node.
Additional resources
4.5.1. Accelerated provisioning of GitOps ZTP
Accelerated provisioning of GitOps ZTP is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can reduce the time taken for cluster installation by using accelerated provisioning of GitOps ZTP for single-node OpenShift. Accelerated ZTP speeds up installation by applying Day 2 manifests derived from policies at an earlier stage.
Accelerated provisioning of GitOps ZTP is supported only when installing single-node OpenShift with Assisted Installer. Otherwise this installation method will fail.
4.5.1.1. Activating accelerated ZTP
You can activate accelerated ZTP using the spec.clusters.clusterLabels.accelerated-ztp
label, as in the following example:
Example Accelerated ZTP SiteConfig
CR.
apiVersion: ran.openshift.io/v2 kind: SiteConfig metadata: name: "example-sno" namespace: "example-sno" spec: baseDomain: "example.com" pullSecretRef: name: "assisted-deployment-pull-secret" clusterImageSetNameRef: "openshift-4.10" sshPublicKey: "ssh-rsa AAAA..." clusters: # ... clusterLabels: common: true group-du-sno: "" sites : "example-sno" accelerated-ztp: full
You can use accelerated-ztp: full
to fully automate the accelerated process. GitOps ZTP updates the AgentClusterInstall
resource with a reference to the accelerated GitOps ZTP ConfigMap
, and includes resources extracted from policies by TALM, and accelerated ZTP job manifests.
If you use accelerated-ztp: partial
, GitOps ZTP does not include the accelerated job manifests, but includes policy-derived objects created during the cluster installation of the following kind
types:
-
PerformanceProfile.performance.openshift.io
-
Tuned.tuned.openshift.io
-
Namespace
-
CatalogSource.operators.coreos.com
-
ContainerRuntimeConfig.machineconfiguration.openshift.io
This partial acceleration can reduce the number of reboots done by the node when applying resources of the kind Performance Profile
, Tuned
, and ContainerRuntimeConfig
. TALM installs the Operator subscriptions derived from policies after RHACM completes the import of the cluster, following the same flow as standard GitOps ZTP.
The benefits of accelerated ZTP increase with the scale of your deployment. Using accelerated-ztp: full
gives more benefit on a large number of clusters. With a smaller number of clusters, the reduction in installation time is less significant. Full accelerated ZTP leaves behind a namespace and a completed job on the spoke that need to be manually removed.
One benefit of using accelerated-ztp: partial
is that you can override the functionality of the on-spoke job if something goes wrong with the stock implementation or if you require a custom functionality.
4.5.1.2. The accelerated ZTP process
Accelerated ZTP uses an additional ConfigMap
to create the resources derived from policies on the spoke cluster. The standard ConfigMap
includes manifests that the GitOps ZTP workflow uses to customize cluster installs.
TALM detects that the accelerated-ztp
label is set and then creates a second ConfigMap
. As part of accelerated ZTP, the SiteConfig
generator adds a reference to that second ConfigMap
using the naming convention <spoke-cluster-name>-aztp
.
After TALM creates that second ConfigMap
, it finds all policies bound to the managed cluster and extracts the GitOps ZTP profile information. TALM adds the GitOps ZTP profile information to the <spoke-cluster-name>-aztp
ConfigMap
custom resource (CR) and applies the CR to the hub cluster API.
4.5.2. Configuring IPsec encryption for single-node OpenShift clusters using GitOps ZTP and SiteConfig resources
You can enable IPsec encryption in managed single-node OpenShift clusters that you install using GitOps ZTP and Red Hat Advanced Cluster Management (RHACM). You can encrypt traffic between the managed cluster and IPsec endpoints external to the managed cluster. All network traffic between nodes on the OVN-Kubernetes cluster network is encrypted with IPsec in Transport mode.
You can also configure IPsec encryption for single-node OpenShift clusters with an additional worker node by following this procedure. It is recommended to use the MachineConfig
custom resource (CR) to configure IPsec encryption for single-node OpenShift clusters and single-node OpenShift clusters with an additional worker node because of their low resource availability.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have configured RHACM and the hub cluster for generating the required installation and policy custom resources (CRs) for managed clusters.
- You have created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
-
You have installed the
butane
utility version 0.20.0 or later. - You have a PKCS#12 certificate for the IPsec endpoint and a CA cert in PEM format.
Procedure
-
Extract the latest version of the
ztp-site-generate
container source and merge it with your repository where you manage your custom site configuration data. Configure
optional-extra-manifest/ipsec/ipsec-endpoint-config.yaml
with the required values that configure IPsec in the cluster. For example:interfaces: - name: hosta_conn type: ipsec libreswan: left: '%defaultroute' leftid: '%fromcert' leftmodecfgclient: false leftcert: left_server 1 leftrsasigkey: '%cert' right: <external_host> 2 rightid: '%fromcert' rightrsasigkey: '%cert' rightsubnet: <external_address> 3 ikev2: insist 4 type: tunnel
- 1
- The value of this field must match with the name of the certificate used on the remote system.
- 2
- Replace
<external_host>
with the external host IP address or DNS hostname. - 3
- Replace
<external_address>
with the IP subnet of the external host on the other side of the IPsec tunnel. - 4
- Use the IKEv2 VPN encryption protocol only. Do not use IKEv1, which is deprecated.
Add the following certificates to the
optional-extra-manifest/ipsec
folder:-
left_server.p12
: The certificate bundle for the IPsec endpoints ca.pem
: The certificate authority that you signed your certificates withThe certificate files are required for the Network Security Services (NSS) database on each host. These files are imported as part of the Butane configuration in later steps.
-
-
Open a shell prompt at the
optional-extra-manifest/ipsec
folder of the Git repository where you maintain your custom site configuration data. Run the
optional-extra-manifest/ipsec/build.sh
script to generate the required Butane andMachineConfig
CRs files.If the PKCS#12 certificate is protected with a password, set the
-W
argument.Example output
out └── argocd └── example └── optional-extra-manifest └── ipsec ├── 99-ipsec-master-endpoint-config.bu 1 ├── 99-ipsec-master-endpoint-config.yaml 2 ├── 99-ipsec-worker-endpoint-config.bu 3 ├── 99-ipsec-worker-endpoint-config.yaml 4 ├── build.sh ├── ca.pem 5 ├── left_server.p12 6 ├── enable-ipsec.yaml ├── ipsec-endpoint-config.yml └── README.md
Create a
custom-manifest/
folder in the repository where you manage your custom site configuration data. Add theenable-ipsec.yaml
and99-ipsec-*
YAML files to the directory. For example:siteconfig ├── site1-sno-du.yaml ├── extra-manifest/ └── custom-manifest ├── enable-ipsec.yaml ├── 99-ipsec-worker-endpoint-config.yaml └── 99-ipsec-master-endpoint-config.yaml
In your
SiteConfig
CR, add thecustom-manifest/
directory to theextraManifests.searchPaths
field. For example:clusters: - clusterName: "site1-sno-du" networkType: "OVNKubernetes" extraManifests: searchPaths: - extra-manifest/ - custom-manifest/
Commit the
SiteConfig
CR changes and updated files in your Git repository and push the changes to provision the managed cluster and configure IPsec encryption.The Argo CD pipeline detects the changes and begins the managed cluster deployment.
During cluster provisioning, the GitOps ZTP pipeline appends the CRs in the
custom-manifest/
directory to the default set of extra manifests stored in theextra-manifest/
directory.
Verification
For information about verifying the IPsec encryption, see "Verifying the IPsec encryption".
4.5.3. Configuring IPsec encryption for multi-node clusters using GitOps ZTP and SiteConfig resources
You can enable IPsec encryption in managed multi-node clusters that you install using GitOps ZTP and Red Hat Advanced Cluster Management (RHACM). You can encrypt traffic between the managed cluster and IPsec endpoints external to the managed cluster. All network traffic between nodes on the OVN-Kubernetes cluster network is encrypted with IPsec in Transport mode.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have configured RHACM and the hub cluster for generating the required installation and policy custom resources (CRs) for managed clusters.
- You have created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
-
You have installed the
butane
utility version 0.20.0 or later. - You have a PKCS#12 certificate for the IPsec endpoint and a CA cert in PEM format.
- You have installed the NMState Operator.
Procedure
-
Extract the latest version of the
ztp-site-generate
container source and merge it with your repository where you manage your custom site configuration data. Configure the
optional-extra-manifest/ipsec/ipsec-config-policy.yaml
file with the required values that configure IPsec in the cluster.ConfigurationPolicy
object for creating an IPsec configurationapiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: name: policy-config spec: namespaceSelector: include: ["default"] exclude: [] matchExpressions: [] matchLabels: {} remediationAction: inform severity: low evaluationInterval: compliant: noncompliant: object-templates-raw: | {{- range (lookup "v1" "Node" "" "").items }} - complianceType: musthave objectDefinition: kind: NodeNetworkConfigurationPolicy apiVersion: nmstate.io/v1 metadata: name: {{ .metadata.name }}-ipsec-policy spec: nodeSelector: kubernetes.io/hostname: {{ .metadata.name }} desiredState: interfaces: - name: hosta_conn type: ipsec libreswan: left: '%defaultroute' leftid: '%fromcert' leftmodecfgclient: false leftcert: left_server 1 leftrsasigkey: '%cert' right: <external_host> 2 rightid: '%fromcert' rightrsasigkey: '%cert' rightsubnet: <external_address> 3 ikev2: insist 4 type: tunnel
- 1
- The value of this field must match with the name of the certificate used on the remote system.
- 2
- Replace
<external_host>
with the external host IP address or DNS hostname. - 3
- Replace
<external_address>
with the IP subnet of the external host on the other side of the IPsec tunnel. - 4
- Use the IKEv2 VPN encryption protocol only. Do not use IKEv1, which is deprecated.
Add the following certificates to the
optional-extra-manifest/ipsec
folder:-
left_server.p12
: The certificate bundle for the IPsec endpoints ca.pem
: The certificate authority that you signed your certificates withThe certificate files are required for the Network Security Services (NSS) database on each host. These files are imported as part of the Butane configuration in later steps.
-
-
Open a shell prompt at the
optional-extra-manifest/ipsec
folder of the Git repository where you maintain your custom site configuration data. Run the
optional-extra-manifest/ipsec/import-certs.sh
script to generate the required Butane andMachineConfig
CRs to import the external certs.If the PKCS#12 certificate is protected with a password, set the
-W
argument.Example output
out └── argocd └── example └── optional-extra-manifest └── ipsec ├── 99-ipsec-master-import-certs.bu 1 ├── 99-ipsec-master-import-certs.yaml 2 ├── 99-ipsec-worker-import-certs.bu 3 ├── 99-ipsec-worker-import-certs.yaml 4 ├── import-certs.sh ├── ca.pem 5 ├── left_server.p12 6 ├── enable-ipsec.yaml ├── ipsec-config-policy.yaml └── README.md
Create a
custom-manifest/
folder in the repository where you manage your custom site configuration data and add theenable-ipsec.yaml
and99-ipsec-*
YAML files to the directory.Example
siteconfig
directorysiteconfig ├── site1-mno-du.yaml ├── extra-manifest/ └── custom-manifest ├── enable-ipsec.yaml ├── 99-ipsec-master-import-certs.yaml └── 99-ipsec-worker-import-certs.yaml
In your
SiteConfig
CR, add thecustom-manifest/
directory to theextraManifests.searchPaths
field, as in the following example:clusters: - clusterName: "site1-mno-du" networkType: "OVNKubernetes" extraManifests: searchPaths: - extra-manifest/ - custom-manifest/
-
Include the
ipsec-config-policy.yaml
config policy file in thesource-crs
directory in GitOps and reference the file in one of thePolicyGenerator
CRs. Commit the
SiteConfig
CR changes and updated files in your Git repository and push the changes to provision the managed cluster and configure IPsec encryption.The Argo CD pipeline detects the changes and begins the managed cluster deployment.
During cluster provisioning, the GitOps ZTP pipeline appends the CRs in the
custom-manifest/
directory to the default set of extra manifests stored in theextra-manifest/
directory.
Verification
For information about verifying the IPsec encryption, see "Verifying the IPsec encryption".
4.5.4. Verifying the IPsec encryption
You can verify that the IPsec encryption is successfully applied in a managed OpenShift Container Platform cluster.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have configured the IPsec encryption.
Procedure
Start a debug pod for the managed cluster by running the following command:
$ oc debug node/<node_name>
Check that the IPsec policy is applied in the cluster node by running the following command:
sh-5.1# ip xfrm policy
Example output
src 172.16.123.0/24 dst 10.1.232.10/32 dir out priority 1757377 ptype main tmpl src 10.1.28.190 dst 10.1.232.10 proto esp reqid 16393 mode tunnel src 10.1.232.10/32 dst 172.16.123.0/24 dir fwd priority 1757377 ptype main tmpl src 10.1.232.10 dst 10.1.28.190 proto esp reqid 16393 mode tunnel src 10.1.232.10/32 dst 172.16.123.0/24 dir in priority 1757377 ptype main tmpl src 10.1.232.10 dst 10.1.28.190 proto esp reqid 16393 mode tunnel
Check that the IPsec tunnel is up and connected by running the following command:
sh-5.1# ip xfrm state
Example output
src 10.1.232.10 dst 10.1.28.190 proto esp spi 0xa62a05aa reqid 16393 mode tunnel replay-window 0 flag af-unspec esn auth-trunc hmac(sha1) 0x8c59f680c8ea1e667b665d8424e2ab749cec12dc 96 enc cbc(aes) 0x2818a489fe84929c8ab72907e9ce2f0eac6f16f2258bd22240f4087e0326badb anti-replay esn context: seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0 replay_window 128, bitmap-length 4 00000000 00000000 00000000 00000000 src 10.1.28.190 dst 10.1.232.10 proto esp spi 0x8e96e9f9 reqid 16393 mode tunnel replay-window 0 flag af-unspec esn auth-trunc hmac(sha1) 0xd960ddc0a6baaccb343396a51295e08cfd8aaddd 96 enc cbc(aes) 0x0273c02e05b4216d5e652de3fc9b3528fea94648bc2b88fa01139fdf0beb27ab anti-replay esn context: seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0 replay_window 128, bitmap-length 4 00000000 00000000 00000000 00000000
Ping a known IP in the external host subnet by running the following command: For example, ping an IP address in the
rightsubnet
range that you set in theipsec/ipsec-endpoint-config.yaml
file:sh-5.1# ping 172.16.110.8
Example output
PING 172.16.110.8 (172.16.110.8) 56(84) bytes of data. 64 bytes from 172.16.110.8: icmp_seq=1 ttl=64 time=153 ms 64 bytes from 172.16.110.8: icmp_seq=2 ttl=64 time=155 ms
4.5.5. Single-node OpenShift SiteConfig CR installation reference
SiteConfig CR field | Description |
---|---|
|
Configure workload partitioning by setting the value for Note
Configuring workload partitioning by using the |
|
Set |
|
Configure the image set available on the hub cluster for all the clusters in the site. To see the list of supported versions on your hub cluster, run |
|
Set the Important
Use the reference configuration as specified in the example |
|
Specifies the cluster image set used to deploy an individual cluster. If defined, it overrides the |
|
Configure cluster labels to correspond to the binding rules in the
For example, |
|
Optional. Set |
| Configure this field to enable disk encryption with Trusted Platform Module (TPM) and Platform Configuration Registers (PCRs) protection. For more information, see "About disk encryption with TPM and PCR protection". Note
Configuring disk encryption by using the |
|
Set the disk encryption type to |
| Configure the Platform Configuration Registers (PCRs) protection for disk encryption. |
| Configure the list of Platform Configuration Registers (PCRs) to be used for disk encryption. You must use PCR registers 1 and 7. |
|
For single-node deployments, define a single host. For three-node deployments, define three hosts. For standard deployments, define three hosts with |
| Specify custom roles for your nodes in your managed clusters. These are additional roles are not used by any OpenShift Container Platform components, only by the user. When you add a custom role, it can be associated with a custom machine config pool that references a specific configuration for that role. Adding custom labels or roles during installation makes the deployment process more effective and prevents the need for additional reboots after the installation is complete. |
|
Optional. Uncomment and set the value to |
| BMC address that you use to access the host. Applies to all cluster types. GitOps ZTP supports iPXE and virtual media booting by using Redfish or IPMI protocols. To use iPXE booting, you must use RHACM 2.8 or later. For more information about BMC addressing, see the "Additional resources" section. |
| BMC address that you use to access the host. Applies to all cluster types. GitOps ZTP supports iPXE and virtual media booting by using Redfish or IPMI protocols. To use iPXE booting, you must use RHACM 2.8 or later. For more information about BMC addressing, see the "Additional resources" section. Note In far edge Telco use cases, only virtual media is supported for use with GitOps ZTP. |
|
Configure the |
|
Set the boot mode for the host to |
|
Specifies the device for deployment. Identifiers that are stable across reboots are recommended. For example, |
| Optional. Use this field to assign partitions for persistent storage. Adjust disk ID and size to the specific hardware. |
| Configure the network settings for the node. |
| Configure the IPv6 address for the host. For single-node OpenShift clusters with static IP addresses, the node-specific API and Ingress IPs should be the same. |
Additional resources
- About disk encryption with TPM and PCR protection.
- Customizing extra installation manifests in the GitOps ZTP pipeline
- Preparing the GitOps ZTP site configuration repository
- Configuring the hub cluster with ArgoCD
- Signalling GitOps ZTP cluster deployment completion with validator inform policies
- Creating the managed bare-metal host secrets
- BMC addressing
- About root device hints
4.6. Managing host firmware settings with GitOps ZTP
Hosts require the correct firmware configuration to ensure high performance and optimal efficiency. You can deploy custom host firmware configurations for managed clusters with GitOps ZTP.
Tune hosts with specific hardware profiles in your lab and ensure they are optimized for your requirements. When you have completed host tuning to your satisfaction, you extract the host profile and save it in your GitOps ZTP repository. Then, you use the host profile to configure firmware settings in the managed cluster hosts that you deploy with GitOps ZTP.
You specify the required hardware profiles in SiteConfig
custom resources (CRs) that you use to deploy the managed clusters. The GitOps ZTP pipeline generates the required HostFirmwareSettings
(HFS
) and BareMetalHost
(BMH
) CRs that are applied to the hub cluster.
Use the following best practices to manage your host firmware profiles.
- Identify critical firmware settings with hardware vendors
- Work with hardware vendors to identify and document critical host firmware settings required for optimal performance and compatibility with the deployed host platform.
- Use common firmware configurations across similar hardware platforms
- Where possible, use a standardized host firmware configuration across similar hardware platforms to reduce complexity and potential errors during deployment.
- Test firmware configurations in a lab environment
- Test host firmware configurations in a controlled lab environment before deploying in production to ensure that settings are compatible with hardware, firmware, and software.
- Manage firmware profiles in source control
- Manage host firmware profiles in Git repositories to track changes, ensure consistency, and facilitate collaboration with vendors.
Additional resources
4.6.1. Retrieving the host firmware schema for a managed cluster
You can discover the host firmware schema for managed clusters. The host firmware schema for bare-metal hosts is populated with information that the Ironic API returns. The API returns information about host firmware interfaces, including firmware setting types, allowable values, ranges, and flags.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have installed Red Hat Advanced Cluster Management (RHACM) and logged in to the hub cluster as a user with
cluster-admin
privileges. - You have provisioned a cluster that is managed by RHACM.
Procedure
Discover the host firmware schema for the managed cluster. Run the following command:
$ oc get firmwareschema -n <managed_cluster_namespace> -o yaml
Example output
apiVersion: v1 items: - apiVersion: metal3.io/v1alpha1 kind: FirmwareSchema metadata: creationTimestamp: "2024-09-11T10:29:43Z" generation: 1 name: schema-40562318 namespace: compute-1 ownerReferences: - apiVersion: metal3.io/v1alpha1 kind: HostFirmwareSettings name: compute-1.example.com uid: 65d0e89b-1cd8-4317-966d-2fbbbe033fe9 resourceVersion: "280057624" uid: 511ad25d-f1c9-457b-9a96-776605c7b887 spec: schema: AccessControlService: allowable_values: - Enabled - Disabled attribute_type: Enumeration read_only: false # ...
4.6.2. Retrieving the host firmware settings for a managed cluster
You can retrieve the host firmware settings for managed clusters. This is useful when you have deployed changes to the host firmware and you want to monitor the changes and ensure that they are applied successfully.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have installed Red Hat Advanced Cluster Management (RHACM) and logged in to the hub cluster as a user with
cluster-admin
privileges. - You have provisioned a cluster that is managed by RHACM.
Procedure
Retrieve the host firmware settings for the managed cluster. Run the following command:
$ oc get hostfirmwaresettings -n <cluster_namespace> <node_name> -o yaml
Example output
apiVersion: v1 items: - apiVersion: metal3.io/v1alpha1 kind: HostFirmwareSettings metadata: creationTimestamp: "2024-09-11T10:29:43Z" generation: 1 name: compute-1.example.com namespace: kni-qe-24 ownerReferences: - apiVersion: metal3.io/v1alpha1 blockOwnerDeletion: true controller: true kind: BareMetalHost name: compute-1.example.com uid: 0baddbb7-bb34-4224-8427-3d01d91c9287 resourceVersion: "280057626" uid: 65d0e89b-1cd8-4317-966d-2fbbbe033fe9 spec: settings: {} status: conditions: - lastTransitionTime: "2024-09-11T10:29:43Z" message: "" observedGeneration: 1 reason: Success status: "True" 1 type: ChangeDetected - lastTransitionTime: "2024-09-11T10:29:43Z" message: Invalid BIOS setting observedGeneration: 1 reason: ConfigurationError status: "False" 2 type: Valid lastUpdated: "2024-09-11T10:29:43Z" schema: name: schema-40562318 namespace: compute-1 settings: 3 AccessControlService: Enabled AcpiHpet: Enabled AcpiRootBridgePxm: Enabled # ...
Optional: Check the status of the
HostFirmwareSettings
(hfs
) custom resource in the cluster:$ oc get hfs -n <managed_cluster_namespace> <managed_cluster_name> -o jsonpath='{.status.conditions[?(@.type=="ChangeDetected")].status}'
Example output
True
Optional: Check for invalid firmware settings in the cluster host. Run the following command:
$ oc get hfs -n <managed_cluster_namespace> <managed_cluster_name> -o jsonpath='{.status.conditions[?(@.type=="Valid")].status}'
Example output
False
4.6.3. Deploying user-defined firmware to cluster hosts with GitOps ZTP
You can deploy user-defined firmware settings to cluster hosts by configuring the SiteConfig
custom resource (CR) to include a hardware profile that you want to apply during cluster host provisioning. You can configure hardware profiles to apply to hosts in the following scenarios:
- All hosts site-wide
- Only cluster hosts that meet certain criteria
- Individual cluster hosts
You can configure host hardware profiles to be applied in a hierarchy. Cluster-level settings override site-wide settings. Node level profiles override cluster and site-wide settings.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have installed Red Hat Advanced Cluster Management (RHACM) and logged in to the hub cluster as a user with
cluster-admin
privileges. - You have provisioned a cluster that is managed by RHACM.
- You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
Procedure
Create the host firmware profile that contain the firmware settings you want to apply. For example, create the following YAML file:
host-firmware.profile
BootMode: Uefi LogicalProc: Enabled ProcVirtualization: Enabled
Save the hardware profile YAML file relative to the
kustomization.yaml
file that you use to define how to provision the cluster, for example:example-ztp/install └── site-install ├── siteconfig-example.yaml ├── kustomization.yaml └── host-firmware.profile
Edit the
SiteConfig
CR to include the firmware profile that you want to apply in the cluster. For example:apiVersion: ran.openshift.io/v1 kind: SiteConfig metadata: name: "site-plan-cluster" namespace: "example-cluster-namespace" spec: baseDomain: "example.com" # ... biosConfigRef: filePath: "./host-firmware.profile" 1
- 1
- Applies the hardware profile to all cluster hosts site-wide
NoteWhere possible, use a single
SiteConfig
CR per cluster.Optional. To apply a hardware profile to hosts in a specific cluster, update
clusters.biosConfigRef.filePath
with the hardware profile that you want to apply. For example:clusters: - clusterName: "cluster-1" # ... biosConfigRef: filePath: "./host-firmware.profile" 1
- 1
- Applies to all hosts in the
cluster-1
cluster
Optional. To apply a hardware profile to a specific host in the cluster, update
clusters.nodes.biosConfigRef.filePath
with the hardware profile that you want to apply. For example:clusters: - clusterName: "cluster-1" # ... nodes: - hostName: "compute-1.example.com" # ... bootMode: "UEFI" biosConfigRef: filePath: "./host-firmware.profile" 1
- 1
- Applies the firmware profile to the
compute-1.example.com
host in the cluster
Commit the
SiteConfig
CR and associatedkustomization.yaml
changes in your Git repository and push the changes.The ArgoCD pipeline detects the changes and begins the managed cluster deployment.
NoteCluster deployment proceeds even if an invalid firmware setting is detected. To apply a correction using GitOps ZTP, re-deploy the cluster with the corrected hardware profile.
Verification
Check that the firmware settings have been applied in the managed cluster host. For example, run the following command:
$ oc get hfs -n <managed_cluster_namespace> <managed_cluster_name> -o jsonpath='{.status.conditions[?(@.type=="Valid")].status}'
Example output
True
4.7. Monitoring managed cluster installation progress
The ArgoCD pipeline uses the SiteConfig
CR to generate the cluster configuration CRs and syncs it with the hub cluster. You can monitor the progress of the synchronization in the ArgoCD dashboard.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
When the synchronization is complete, the installation generally proceeds as follows:
The Assisted Service Operator installs OpenShift Container Platform on the cluster. You can monitor the progress of cluster installation from the RHACM dashboard or from the command line by running the following commands:
Export the cluster name:
$ export CLUSTER=<clusterName>
Query the
AgentClusterInstall
CR for the managed cluster:$ oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Completed")]}' | jq
Get the installation events for the cluster:
$ curl -sk $(oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.debugInfo.eventsURL}') | jq '.[-2,-1]'
4.8. Troubleshooting GitOps ZTP by validating the installation CRs
The ArgoCD pipeline uses the SiteConfig
and PolicyGenerator
or PolicyGentemplate
custom resources (CRs) to generate the cluster configuration CRs and Red Hat Advanced Cluster Management (RHACM) policies. Use the following steps to troubleshoot issues that might occur during this process.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
Check that the installation CRs were created by using the following command:
$ oc get AgentClusterInstall -n <cluster_name>
If no object is returned, use the following steps to troubleshoot the ArgoCD pipeline flow from
SiteConfig
files to the installation CRs.Verify that the
ManagedCluster
CR was generated using theSiteConfig
CR on the hub cluster:$ oc get managedcluster
If the
ManagedCluster
is missing, check if theclusters
application failed to synchronize the files from the Git repository to the hub cluster:$ oc describe -n openshift-gitops application clusters
Check for the
Status.Conditions
field to view the error logs for the managed cluster. For example, setting an invalid value forextraManifestPath:
in theSiteConfig
CR raises the following error:Status: Conditions: Last Transition Time: 2021-11-26T17:21:39Z Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/siteconfigs/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not create extra-manifest ranSite1.extra-manifest3 stat extra-manifest3: no such file or directory 2021/11/26 17:21:40 Error: could not build the entire SiteConfig defined by /tmp/kust-plugin-config-913473579: stat extra-manifest3: no such file or directory Error: failure in plugin configured via /tmp/kust-plugin-config-913473579; exit status 1: exit status 1 Type: ComparisonError
Check the
Status.Sync
field. If there are log errors, theStatus.Sync
field could indicate anUnknown
error:Status: Sync: Compared To: Destination: Namespace: clusters-sub Server: https://kubernetes.default.svc Source: Path: sites-config Repo URL: https://git.com/ran-sites/siteconfigs/.git Target Revision: master Status: Unknown
4.9. Troubleshooting GitOps ZTP virtual media booting on SuperMicro servers
SuperMicro X11 servers do not support virtual media installations when the image is served using the https
protocol. As a result, single-node OpenShift deployments for this environment fail to boot on the target node. To avoid this issue, log in to the hub cluster and disable Transport Layer Security (TLS) in the Provisioning
resource. This ensures the image is not served with TLS even though the image address uses the https
scheme.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
Disable TLS in the
Provisioning
resource by running the following command:$ oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"disableVirtualMediaTLS": true}}'
- Continue the steps to deploy your single-node OpenShift cluster.
4.10. Removing a managed cluster site from the GitOps ZTP pipeline
You can remove a managed site and the associated installation and configuration policy CRs from the GitOps Zero Touch Provisioning (ZTP) pipeline.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
-
Remove a site and the associated CRs by removing the associated
SiteConfig
andPolicyGenerator
orPolicyGentemplate
files from thekustomization.yaml
file. Add the following
syncOptions
field to yourSiteConfig
application.kind: Application spec: syncPolicy: syncOptions: - PrunePropagationPolicy=background
When you run the GitOps ZTP pipeline again, the generated CRs are removed.
-
Optional: If you want to permanently remove a site, you should also remove the
SiteConfig
and site-specificPolicyGenerator
orPolicyGentemplate
files from the Git repository. -
Optional: If you want to remove a site temporarily, for example when redeploying a site, you can leave the
SiteConfig
and site-specificPolicyGenerator
orPolicyGentemplate
CRs in the Git repository.
Additional resources
- For information about removing a cluster, see Removing a cluster from management.
4.11. Removing obsolete content from the GitOps ZTP pipeline
If a change to the PolicyGenerator
or PolicyGentemplate
configuration results in obsolete policies, for example, if you rename policies, use the following procedure to remove the obsolete policies.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
-
Remove the affected
PolicyGenerator
orPolicyGentemplate
files from the Git repository, commit and push to the remote repository. - Wait for the changes to synchronize through the application and the affected policies to be removed from the hub cluster.
Add the updated
PolicyGenerator
orPolicyGentemplate
files back to the Git repository, and then commit and push to the remote repository.NoteRemoving GitOps Zero Touch Provisioning (ZTP) policies from the Git repository, and as a result also removing them from the hub cluster, does not affect the configuration of the managed cluster. The policy and CRs managed by that policy remains in place on the managed cluster.
Optional: As an alternative, after making changes to
PolicyGenerator
orPolicyGentemplate
CRs that result in obsolete policies, you can remove these policies from the hub cluster manually. You can delete policies from the RHACM console using the Governance tab or by running the following command:$ oc delete policy -n <namespace> <policy_name>
4.12. Tearing down the GitOps ZTP pipeline
You can remove the ArgoCD pipeline and all generated GitOps Zero Touch Provisioning (ZTP) artifacts.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
- Detach all clusters from Red Hat Advanced Cluster Management (RHACM) on the hub cluster.
Delete the
kustomization.yaml
file in thedeployment
directory using the following command:$ oc delete -k out/argocd/deployment
- Commit and push your changes to the site repository.
Chapter 5. Manually installing a single-node OpenShift cluster with GitOps ZTP
You can deploy a managed single-node OpenShift cluster by using Red Hat Advanced Cluster Management (RHACM) and the assisted service.
If you are creating multiple managed clusters, use the SiteConfig
method described in Deploying far edge sites with ZTP.
The target bare-metal host must meet the networking, firmware, and hardware requirements listed in Recommended cluster configuration for vDU application workloads.
5.1. Generating GitOps ZTP installation and configuration CRs manually
Use the generator
entrypoint for the ztp-site-generate
container to generate the site installation and configuration custom resource (CRs) for a cluster based on SiteConfig
and PolicyGenerator
CRs.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
Create an output folder by running the following command:
$ mkdir -p ./out
Export the
argocd
directory from theztp-site-generate
container image:$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17 extract /home/ztp --tar | tar x -C ./out
The
./out
directory has the referencePolicyGenerator
andSiteConfig
CRs in theout/argocd/example/
folder.Example output
out └── argocd └── example ├── acmpolicygenerator │ ├── {policy-prefix}common-ranGen.yaml │ ├── {policy-prefix}example-sno-site.yaml │ ├── {policy-prefix}group-du-sno-ranGen.yaml │ ├── {policy-prefix}group-du-sno-validator-ranGen.yaml │ ├── ... │ ├── kustomization.yaml │ └── ns.yaml └── siteconfig ├── example-sno.yaml ├── KlusterletAddonConfigOverride.yaml └── kustomization.yaml
Create an output folder for the site installation CRs:
$ mkdir -p ./site-install
Modify the example
SiteConfig
CR for the cluster type that you want to install. Copyexample-sno.yaml
tosite-1-sno.yaml
and modify the CR to match the details of the site and bare-metal host that you want to install, for example:# example-node1-bmh-secret & assisted-deployment-pull-secret need to be created under same namespace example-sno --- apiVersion: ran.openshift.io/v1 kind: SiteConfig metadata: name: "example-sno" namespace: "example-sno" spec: baseDomain: "example.com" pullSecretRef: name: "assisted-deployment-pull-secret" clusterImageSetNameRef: "openshift-4.16" sshPublicKey: "ssh-rsa AAAA..." clusters: - clusterName: "example-sno" networkType: "OVNKubernetes" # installConfigOverrides is a generic way of passing install-config # parameters through the siteConfig. The 'capabilities' field configures # the composable openshift feature. In this 'capabilities' setting, we # remove all the optional set of components. # Notes: # - OperatorLifecycleManager is needed for 4.15 and later # - NodeTuning is needed for 4.13 and later, not for 4.12 and earlier # - Ingress is needed for 4.16 and later installConfigOverrides: | { "capabilities": { "baselineCapabilitySet": "None", "additionalEnabledCapabilities": [ "NodeTuning", "OperatorLifecycleManager", "Ingress" ] } } # It is strongly recommended to include crun manifests as part of the additional install-time manifests for 4.13+. # The crun manifests can be obtained from source-crs/optional-extra-manifest/ and added to the git repo ie.sno-extra-manifest. # extraManifestPath: sno-extra-manifest clusterLabels: # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples du-profile: "latest" # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples in ../policygentemplates: # ../policygentemplates/common-ranGen.yaml will apply to all clusters with 'common: true' common: true # ../policygentemplates/group-du-sno-ranGen.yaml will apply to all clusters with 'group-du-sno: ""' group-du-sno: "" # ../policygentemplates/example-sno-site.yaml will apply to all clusters with 'sites: "example-sno"' # Normally this should match or contain the cluster name so it only applies to a single cluster sites: "example-sno" clusterNetwork: - cidr: 1001:1::/48 hostPrefix: 64 machineNetwork: - cidr: 1111:2222:3333:4444::/64 serviceNetwork: - 1001:2::/112 additionalNTPSources: - 1111:2222:3333:4444::2 # Initiates the cluster for workload partitioning. Setting specific reserved/isolated CPUSets is done via PolicyTemplate # please see Workload Partitioning Feature for a complete guide. cpuPartitioningMode: AllNodes # Optionally; This can be used to override the KlusterletAddonConfig that is created for this cluster: #crTemplates: # KlusterletAddonConfig: "KlusterletAddonConfigOverride.yaml" nodes: - hostName: "example-node1.example.com" role: "master" # Optionally; This can be used to configure desired BIOS setting on a host: #biosConfigRef: # filePath: "example-hw.profile" bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1" bmcCredentialsName: name: "example-node1-bmh-secret" bootMACAddress: "AA:BB:CC:DD:EE:11" # Use UEFISecureBoot to enable secure boot. bootMode: "UEFISecureBoot" rootDeviceHints: deviceName: "/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0" # disk partition at `/var/lib/containers` with ignitionConfigOverride. Some values must be updated. See DiskPartitionContainer.md for more details ignitionConfigOverride: | { "ignition": { "version": "3.2.0" }, "storage": { "disks": [ { "device": "/dev/disk/by-id/wwn-0x6b07b250ebb9d0002a33509f24af1f62", "partitions": [ { "label": "var-lib-containers", "sizeMiB": 0, "startMiB": 250000 } ], "wipeTable": false } ], "filesystems": [ { "device": "/dev/disk/by-partlabel/var-lib-containers", "format": "xfs", "mountOptions": [ "defaults", "prjquota" ], "path": "/var/lib/containers", "wipeFilesystem": true } ] }, "systemd": { "units": [ { "contents": "# Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target", "enabled": true, "name": "var-lib-containers.mount" } ] } } nodeNetwork: interfaces: - name: eno1 macAddress: "AA:BB:CC:DD:EE:11" config: interfaces: - name: eno1 type: ethernet state: up ipv4: enabled: false ipv6: enabled: true address: # For SNO sites with static IP addresses, the node-specific, # API and Ingress IPs should all be the same and configured on # the interface - ip: 1111:2222:3333:4444::aaaa:1 prefix-length: 64 dns-resolver: config: search: - example.com server: - 1111:2222:3333:4444::2 routes: config: - destination: ::/0 next-hop-interface: eno1 next-hop-address: 1111:2222:3333:4444::1 table-id: 254
NoteOnce you have extracted reference CR configuration files from the
out/extra-manifest
directory of theztp-site-generate
container, you can useextraManifests.searchPaths
to include the path to the git directory containing those files. This allows the GitOps ZTP pipeline to apply those CR files during cluster installation. If you configure asearchPaths
directory, the GitOps ZTP pipeline does not fetch manifests from theztp-site-generate
container during site installation.Generate the Day 0 installation CRs by processing the modified
SiteConfig
CRsite-1-sno.yaml
by running the following command:$ podman run -it --rm -v `pwd`/out/argocd/example/siteconfig:/resources:Z -v `pwd`/site-install:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17 generator install site-1-sno.yaml /output
Example output
site-install └── site-1-sno ├── site-1_agentclusterinstall_example-sno.yaml ├── site-1-sno_baremetalhost_example-node1.example.com.yaml ├── site-1-sno_clusterdeployment_example-sno.yaml ├── site-1-sno_configmap_example-sno.yaml ├── site-1-sno_infraenv_example-sno.yaml ├── site-1-sno_klusterletaddonconfig_example-sno.yaml ├── site-1-sno_machineconfig_02-master-workload-partitioning.yaml ├── site-1-sno_machineconfig_predefined-extra-manifests-master.yaml ├── site-1-sno_machineconfig_predefined-extra-manifests-worker.yaml ├── site-1-sno_managedcluster_example-sno.yaml ├── site-1-sno_namespace_example-sno.yaml └── site-1-sno_nmstateconfig_example-node1.example.com.yaml
Optional: Generate just the Day 0
MachineConfig
installation CRs for a particular cluster type by processing the referenceSiteConfig
CR with the-E
option. For example, run the following commands:Create an output folder for the
MachineConfig
CRs:$ mkdir -p ./site-machineconfig
Generate the
MachineConfig
installation CRs:$ podman run -it --rm -v `pwd`/out/argocd/example/siteconfig:/resources:Z -v `pwd`/site-machineconfig:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17 generator install -E site-1-sno.yaml /output
Example output
site-machineconfig └── site-1-sno ├── site-1-sno_machineconfig_02-master-workload-partitioning.yaml ├── site-1-sno_machineconfig_predefined-extra-manifests-master.yaml └── site-1-sno_machineconfig_predefined-extra-manifests-worker.yaml
Generate and export the Day 2 configuration CRs using the reference
PolicyGenerator
CRs from the previous step. Run the following commands:Create an output folder for the Day 2 CRs:
$ mkdir -p ./ref
Generate and export the Day 2 configuration CRs:
$ podman run -it --rm -v `pwd`/out/argocd/example/acmpolicygenerator:/resources:Z -v `pwd`/ref:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17 generator config -N . /output
The command generates example group and site-specific
PolicyGenerator
CRs for single-node OpenShift, three-node clusters, and standard clusters in the./ref
folder.Example output
ref └── customResource ├── common ├── example-multinode-site ├── example-sno ├── group-du-3node ├── group-du-3node-validator │ └── Multiple-validatorCRs ├── group-du-sno ├── group-du-sno-validator ├── group-du-standard └── group-du-standard-validator └── Multiple-validatorCRs
- Use the generated CRs as the basis for the CRs that you use to install the cluster. You apply the installation CRs to the hub cluster as described in "Installing a single managed cluster". The configuration CRs can be applied to the cluster after cluster installation is complete.
Verification
Verify that the custom roles and labels are applied after the node is deployed:
$ oc describe node example-node.example.com
Example output
Name: example-node.example.com
Roles: control-plane,example-label,master,worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
custom-label/parameter1=true
kubernetes.io/arch=amd64
kubernetes.io/hostname=cnfdf03.telco5gran.eng.rdu2.redhat.com
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node-role.kubernetes.io/example-label= 1
node-role.kubernetes.io/master=
node-role.kubernetes.io/worker=
node.openshift.io/os_id=rhcos
- 1
- The custom label is applied to the node.
5.2. Creating the managed bare-metal host secrets
Add the required Secret
custom resources (CRs) for the managed bare-metal host to the hub cluster. You need a secret for the GitOps Zero Touch Provisioning (ZTP) pipeline to access the Baseboard Management Controller (BMC) and a secret for the assisted installer service to pull cluster installation images from the registry.
The secrets are referenced from the SiteConfig
CR by name. The namespace must match the SiteConfig
namespace.
Procedure
Create a YAML secret file containing credentials for the host Baseboard Management Controller (BMC) and a pull secret required for installing OpenShift and all add-on cluster Operators:
Save the following YAML as the file
example-sno-secret.yaml
:apiVersion: v1 kind: Secret metadata: name: example-sno-bmc-secret namespace: example-sno 1 data: 2 password: <base64_password> username: <base64_username> type: Opaque --- apiVersion: v1 kind: Secret metadata: name: pull-secret namespace: example-sno 3 data: .dockerconfigjson: <pull_secret> 4 type: kubernetes.io/dockerconfigjson
-
Add the relative path to
example-sno-secret.yaml
to thekustomization.yaml
file that you use to install the cluster.
5.3. Configuring Discovery ISO kernel arguments for manual installations using GitOps ZTP
The GitOps Zero Touch Provisioning (ZTP) workflow uses the Discovery ISO as part of the OpenShift Container Platform installation process on managed bare-metal hosts. You can edit the InfraEnv
resource to specify kernel arguments for the Discovery ISO. This is useful for cluster installations with specific environmental requirements. For example, configure the rd.net.timeout.carrier
kernel argument for the Discovery ISO to facilitate static networking for the cluster or to receive a DHCP address before downloading the root file system during installation.
In OpenShift Container Platform 4.17, you can only add kernel arguments. You can not replace or delete kernel arguments.
Prerequisites
- You have installed the OpenShift CLI (oc).
- You have logged in to the hub cluster as a user with cluster-admin privileges.
- You have manually generated the installation and configuration custom resources (CRs).
Procedure
-
Edit the
spec.kernelArguments
specification in theInfraEnv
CR to configure kernel arguments:
apiVersion: agent-install.openshift.io/v1beta1 kind: InfraEnv metadata: name: <cluster_name> namespace: <cluster_name> spec: kernelArguments: - operation: append 1 value: audit=0 2 - operation: append value: trace=1 clusterRef: name: <cluster_name> namespace: <cluster_name> pullSecretRef: name: pull-secret
The SiteConfig
CR generates the InfraEnv
resource as part of the day-0 installation CRs.
Verification
To verify that the kernel arguments are applied, after the Discovery image verifies that OpenShift Container Platform is ready for installation, you can SSH to the target host before the installation process begins. At that point, you can view the kernel arguments for the Discovery ISO in the /proc/cmdline
file.
Begin an SSH session with the target host:
$ ssh -i /path/to/privatekey core@<host_name>
View the system’s kernel arguments by using the following command:
$ cat /proc/cmdline
5.4. Installing a single managed cluster
You can manually deploy a single managed cluster using the assisted service and Red Hat Advanced Cluster Management (RHACM).
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. -
You have created the baseboard management controller (BMC)
Secret
and the image pull-secretSecret
custom resources (CRs). See "Creating the managed bare-metal host secrets" for details. - Your target bare-metal host meets the networking and hardware requirements for managed clusters.
Procedure
Create a
ClusterImageSet
for each specific cluster version to be deployed, for exampleclusterImageSet-4.17.yaml
. AClusterImageSet
has the following format:apiVersion: hive.openshift.io/v1 kind: ClusterImageSet metadata: name: openshift-4.17.0 1 spec: releaseImage: quay.io/openshift-release-dev/ocp-release:4.17.0-x86_64 2
Apply the
clusterImageSet
CR:$ oc apply -f clusterImageSet-4.17.yaml
Create the
Namespace
CR in thecluster-namespace.yaml
file:apiVersion: v1 kind: Namespace metadata: name: <cluster_name> 1 labels: name: <cluster_name> 2
Apply the
Namespace
CR by running the following command:$ oc apply -f cluster-namespace.yaml
Apply the generated day-0 CRs that you extracted from the
ztp-site-generate
container and customized to meet your requirements:$ oc apply -R ./site-install/site-sno-1
5.5. Monitoring the managed cluster installation status
Ensure that cluster provisioning was successful by checking the cluster status.
Prerequisites
-
All of the custom resources have been configured and provisioned, and the
Agent
custom resource is created on the hub for the managed cluster.
Procedure
Check the status of the managed cluster:
$ oc get managedcluster
True
indicates the managed cluster is ready.Check the agent status:
$ oc get agent -n <cluster_name>
Use the
describe
command to provide an in-depth description of the agent’s condition. Statuses to be aware of includeBackendError
,InputError
,ValidationsFailing
,InstallationFailed
, andAgentIsConnected
. These statuses are relevant to theAgent
andAgentClusterInstall
custom resources.$ oc describe agent -n <cluster_name>
Check the cluster provisioning status:
$ oc get agentclusterinstall -n <cluster_name>
Use the
describe
command to provide an in-depth description of the cluster provisioning status:$ oc describe agentclusterinstall -n <cluster_name>
Check the status of the managed cluster’s add-on services:
$ oc get managedclusteraddon -n <cluster_name>
Retrieve the authentication information of the
kubeconfig
file for the managed cluster:$ oc get secret -n <cluster_name> <cluster_name>-admin-kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d > <directory>/<cluster_name>-kubeconfig
5.6. Troubleshooting the managed cluster
Use this procedure to diagnose any installation issues that might occur with the managed cluster.
Procedure
Check the status of the managed cluster:
$ oc get managedcluster
Example output
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE SNO-cluster true True True 2d19h
If the status in the
AVAILABLE
column isTrue
, the managed cluster is being managed by the hub.If the status in the
AVAILABLE
column isUnknown
, the managed cluster is not being managed by the hub. Use the following steps to continue checking to get more information.Check the
AgentClusterInstall
install status:$ oc get clusterdeployment -n <cluster_name>
Example output
NAME PLATFORM REGION CLUSTERTYPE INSTALLED INFRAID VERSION POWERSTATE AGE Sno0026 agent-baremetal false Initialized 2d14h
If the status in the
INSTALLED
column isfalse
, the installation was unsuccessful.If the installation failed, enter the following command to review the status of the
AgentClusterInstall
resource:$ oc describe agentclusterinstall -n <cluster_name> <cluster_name>
Resolve the errors and reset the cluster:
Remove the cluster’s managed cluster resource:
$ oc delete managedcluster <cluster_name>
Remove the cluster’s namespace:
$ oc delete namespace <cluster_name>
This deletes all of the namespace-scoped custom resources created for this cluster. You must wait for the
ManagedCluster
CR deletion to complete before proceeding.- Recreate the custom resources for the managed cluster.
5.7. RHACM generated cluster installation CRs reference
Red Hat Advanced Cluster Management (RHACM) supports deploying OpenShift Container Platform on single-node clusters, three-node clusters, and standard clusters with a specific set of installation custom resources (CRs) that you generate using SiteConfig
CRs for each site.
Every managed cluster has its own namespace, and all of the installation CRs except for ManagedCluster
and ClusterImageSet
are under that namespace. ManagedCluster
and ClusterImageSet
are cluster-scoped, not namespace-scoped. The namespace and the CR names match the cluster name.
The following table lists the installation CRs that are automatically applied by the RHACM assisted service when it installs clusters using the SiteConfig
CRs that you configure.
CR | Description | Usage |
---|---|---|
| Contains the connection information for the Baseboard Management Controller (BMC) of the target bare-metal host. | Provides access to the BMC to load and start the discovery image on the target server by using the Redfish protocol. |
| Contains information for installing OpenShift Container Platform on the target bare-metal host. |
Used with |
|
Specifies details of the managed cluster configuration such as networking and the number of control plane nodes. Displays the cluster | Specifies the managed cluster configuration information and provides status during the installation of the cluster. |
|
References the |
Used with |
|
Provides network configuration information such as | Sets up a static IP address for the managed cluster’s Kube API server. |
| Contains hardware information about the target bare-metal host. | Created automatically on the hub when the target machine’s discovery image boots. |
| When a cluster is managed by the hub, it must be imported and known. This Kubernetes object provides that interface. | The hub uses this resource to manage and show the status of managed clusters. |
|
Contains the list of services provided by the hub to be deployed to the |
Tells the hub which addon services to deploy to the |
|
Logical space for |
Propagates resources to the |
|
Two CRs are created: |
|
| Contains OpenShift Container Platform image information such as the repository and image name. | Passed into resources to provide OpenShift Container Platform images. |
Chapter 6. Recommended single-node OpenShift cluster configuration for vDU application workloads
Use the following reference information to understand the single-node OpenShift configurations required to deploy virtual distributed unit (vDU) applications in the cluster. Configurations include cluster optimizations for high performance workloads, enabling workload partitioning, and minimizing the number of reboots required postinstallation.
Additional resources
- To deploy a single cluster by hand, see Manually installing a single-node OpenShift cluster with GitOps ZTP.
- To deploy a fleet of clusters using GitOps Zero Touch Provisioning (ZTP), see Deploying far edge sites with GitOps ZTP.
6.1. Running low latency applications on OpenShift Container Platform
OpenShift Container Platform enables low latency processing for applications running on commercial off-the-shelf (COTS) hardware by using several technologies and specialized hardware devices:
- Real-time kernel for RHCOS
- Ensures workloads are handled with a high degree of process determinism.
- CPU isolation
- Avoids CPU scheduling delays and ensures CPU capacity is available consistently.
- NUMA-aware topology management
- Aligns memory and huge pages with CPU and PCI devices to pin guaranteed container memory and huge pages to the non-uniform memory access (NUMA) node. Pod resources for all Quality of Service (QoS) classes stay on the same NUMA node. This decreases latency and improves performance of the node.
- Huge pages memory management
- Using huge page sizes improves system performance by reducing the amount of system resources required to access page tables.
- Precision timing synchronization using PTP
- Allows synchronization between nodes in the network with sub-microsecond accuracy.
6.2. Recommended cluster host requirements for vDU application workloads
Running vDU application workloads requires a bare-metal host with sufficient resources to run OpenShift Container Platform services and production workloads.
Profile | vCPU | Memory | Storage |
---|---|---|---|
Minimum | 4 to 8 vCPU | 32GB of RAM | 120GB |
One vCPU equals one physical core. However, if you enable simultaneous multithreading (SMT), or Hyper-Threading, use the following formula to calculate the number of vCPUs that represent one physical core:
- (threads per core × cores) × sockets = vCPUs
The server must have a Baseboard Management Controller (BMC) when booting with virtual media.
6.3. Configuring host firmware for low latency and high performance
Bare-metal hosts require the firmware to be configured before the host can be provisioned. The firmware configuration is dependent on the specific hardware and the particular requirements of your installation.
Procedure
-
Set the UEFI/BIOS Boot Mode to
UEFI
. - In the host boot sequence order, set Hard drive first.
Apply the specific firmware configuration for your hardware. The following table describes a representative firmware configuration for an Intel Xeon Skylake server and later hardware generations, based on the Intel FlexRAN 4G and 5G baseband PHY reference design.
ImportantThe exact firmware configuration depends on your specific hardware and network requirements. The following sample configuration is for illustrative purposes only.
Table 6.2. Sample firmware configuration Firmware setting Configuration CPU Power and Performance Policy
Performance
Uncore Frequency Scaling
Disabled
Performance P-limit
Disabled
Enhanced Intel SpeedStep ® Tech
Enabled
Intel Configurable TDP
Enabled
Configurable TDP Level
Level 2
Intel® Turbo Boost Technology
Enabled
Energy Efficient Turbo
Disabled
Hardware P-States
Disabled
Package C-State
C0/C1 state
C1E
Disabled
Processor C6
Disabled
Enable global SR-IOV and VT-d settings in the firmware for the host. These settings are relevant to bare-metal environments.
6.4. Connectivity prerequisites for managed cluster networks
Before you can install and provision a managed cluster with the GitOps Zero Touch Provisioning (ZTP) pipeline, the managed cluster host must meet the following networking prerequisites:
- There must be bi-directional connectivity between the GitOps ZTP container in the hub cluster and the Baseboard Management Controller (BMC) of the target bare-metal host.
The managed cluster must be able to resolve and reach the API hostname of the hub hostname and
*.apps
hostname. Here is an example of the API hostname of the hub and*.apps
hostname:-
api.hub-cluster.internal.domain.com
-
console-openshift-console.apps.hub-cluster.internal.domain.com
-
The hub cluster must be able to resolve and reach the API and
*.apps
hostname of the managed cluster. Here is an example of the API hostname of the managed cluster and*.apps
hostname:-
api.sno-managed-cluster-1.internal.domain.com
-
console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com
-
6.5. Workload partitioning in single-node OpenShift with GitOps ZTP
Workload partitioning configures OpenShift Container Platform services, cluster management workloads, and infrastructure pods to run on a reserved number of host CPUs.
To configure workload partitioning with GitOps Zero Touch Provisioning (ZTP), you configure a cpuPartitioningMode
field in the SiteConfig
custom resource (CR) that you use to install the cluster and you apply a PerformanceProfile
CR that configures the isolated
and reserved
CPUs on the host.
Configuring the SiteConfig
CR enables workload partitioning at cluster installation time and applying the PerformanceProfile
CR configures the specific allocation of CPUs to reserved and isolated sets. Both of these steps happen at different points during cluster provisioning.
Configuring workload partitioning by using the cpuPartitioningMode
field in the SiteConfig
CR is a Tech Preview feature in OpenShift Container Platform 4.13.
Alternatively, you can specify cluster management CPU resources with the cpuset
field of the SiteConfig
custom resource (CR) and the reserved
field of the group PolicyGenerator
or PolicyGentemplate
CR. The GitOps ZTP pipeline uses these values to populate the required fields in the workload partitioning MachineConfig
CR (cpuset
) and the PerformanceProfile
CR (reserved
) that configure the single-node OpenShift cluster. This method is a General Availability feature in OpenShift Container Platform 4.14.
The workload partitioning configuration pins the OpenShift Container Platform infrastructure pods to the reserved
CPU set. Platform services such as systemd, CRI-O, and kubelet run on the reserved
CPU set. The isolated
CPU sets are exclusively allocated to your container workloads. Isolating CPUs ensures that the workload has guaranteed access to the specified CPUs without contention from other applications running on the same node. All CPUs that are not isolated should be reserved.
Ensure that reserved
and isolated
CPU sets do not overlap with each other.
Additional resources
- For the recommended single-node OpenShift workload partitioning configuration, see Workload partitioning.
6.6. About disk encryption with TPM and PCR protection
You can use the diskEncryption
field in the SiteConfig
custom resource (CR) to configure disk encryption with Trusted Platform Module (TPM) and Platform Configuration Registers (PCRs) protection.
TPM is a hardware component that stores cryptographic keys and evaluates the security state of your system. PCRs within the TPM store hash values that represent the current hardware and software configuration of your system. You can use the following PCR registers to protect the encryption keys for disk encryption:
- PCR 1
- Represents the Unified Extensible Firmware Interface (UEFI) state.
- PCR 7
- Represents the secure boot state.
The TPM safeguards encryption keys by linking them to the system’s current state, as recorded in PCR 1 and PCR 7. The dmcrypt
utility uses these keys to encrypt the disk. The binding between the encryption keys and the expected PCR registers is automatically updated after upgrades, if needed.
During the system boot process, the dmcrypt
utility uses the TPM PCR values to unlock the disk. If the current PCR values match with the previously linked values, the unlock succeeds. If the PCR values do not match, the encryption keys cannot be released, and the disk remains encrypted and inaccessible.
Configuring disk encryption by using the diskEncryption
field in the SiteConfig
CR is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Additional resources
- TPM encryption
- For information about enabling disk encryption, see Enabling disk encryption with TPM and PCR protection.
6.7. Recommended cluster install manifests
The ZTP pipeline applies the following custom resources (CRs) during cluster installation. These configuration CRs ensure that the cluster meets the feature and performance requirements necessary for running a vDU application.
When using the GitOps ZTP plugin and SiteConfig
CRs for cluster deployment, the following MachineConfig
CRs are included by default.
Use the SiteConfig
extraManifests
filter to alter the CRs that are included by default. For more information, see Advanced managed cluster configuration with SiteConfig CRs.
6.7.1. Workload partitioning
Single-node OpenShift clusters that run DU workloads require workload partitioning. This limits the cores allowed to run platform services, maximizing the CPU core for application payloads.
Workload partitioning can be enabled during cluster installation only. You cannot disable workload partitioning postinstallation. You can however change the set of CPUs assigned to the isolated and reserved sets through the PerformanceProfile
CR. Changes to CPU settings cause the node to reboot.
When transitioning to using cpuPartitioningMode
for enabling workload partitioning, remove the workload partitioning MachineConfig
CRs from the /extra-manifest
folder that you use to provision the cluster.
Recommended SiteConfig
CR configuration for workload partitioning
apiVersion: ran.openshift.io/v1
kind: SiteConfig
metadata:
name: "<site_name>"
namespace: "<site_name>"
spec:
baseDomain: "example.com"
cpuPartitioningMode: AllNodes 1
- 1
- Set the
cpuPartitioningMode
field toAllNodes
to configure workload partitioning for all nodes in the cluster.
Verification
Check that the applications and cluster system CPU pinning is correct. Run the following commands:
Open a remote shell prompt to the managed cluster:
$ oc debug node/example-sno-1
Check that the OpenShift infrastructure applications CPU pinning is correct:
sh-4.4# pgrep ovn | while read i; do taskset -cp $i; done
Example output
pid 8481's current affinity list: 0-1,52-53 pid 8726's current affinity list: 0-1,52-53 pid 9088's current affinity list: 0-1,52-53 pid 9945's current affinity list: 0-1,52-53 pid 10387's current affinity list: 0-1,52-53 pid 12123's current affinity list: 0-1,52-53 pid 13313's current affinity list: 0-1,52-53
Check that the system applications CPU pinning is correct:
sh-4.4# pgrep systemd | while read i; do taskset -cp $i; done
Example output
pid 1's current affinity list: 0-1,52-53 pid 938's current affinity list: 0-1,52-53 pid 962's current affinity list: 0-1,52-53 pid 1197's current affinity list: 0-1,52-53
6.7.2. Reduced platform management footprint
To reduce the overall management footprint of the platform, a MachineConfig
custom resource (CR) is required that places all Kubernetes-specific mount points in a new namespace separate from the host operating system. The following base64-encoded example MachineConfig
CR illustrates this configuration.
Recommended container mount namespace configuration (01-container-mount-ns-and-kubelet-conf-master.yaml
)
# Automatically generated by extra-manifests-builder # Do not make changes directly. apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: container-mount-namespace-and-kubelet-conf-master spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKCmRlYnVnKCkgewogIGVjaG8gJEAgPiYyCn0KCnVzYWdlKCkgewogIGVjaG8gVXNhZ2U6ICQoYmFzZW5hbWUgJDApIFVOSVQgW2VudmZpbGUgW3Zhcm5hbWVdXQogIGVjaG8KICBlY2hvIEV4dHJhY3QgdGhlIGNvbnRlbnRzIG9mIHRoZSBmaXJzdCBFeGVjU3RhcnQgc3RhbnphIGZyb20gdGhlIGdpdmVuIHN5c3RlbWQgdW5pdCBhbmQgcmV0dXJuIGl0IHRvIHN0ZG91dAogIGVjaG8KICBlY2hvICJJZiAnZW52ZmlsZScgaXMgcHJvdmlkZWQsIHB1dCBpdCBpbiB0aGVyZSBpbnN0ZWFkLCBhcyBhbiBlbnZpcm9ubWVudCB2YXJpYWJsZSBuYW1lZCAndmFybmFtZSciCiAgZWNobyAiRGVmYXVsdCAndmFybmFtZScgaXMgRVhFQ1NUQVJUIGlmIG5vdCBzcGVjaWZpZWQiCiAgZXhpdCAxCn0KClVOSVQ9JDEKRU5WRklMRT0kMgpWQVJOQU1FPSQzCmlmIFtbIC16ICRVTklUIHx8ICRVTklUID09ICItLWhlbHAiIHx8ICRVTklUID09ICItaCIgXV07IHRoZW4KICB1c2FnZQpmaQpkZWJ1ZyAiRXh0cmFjdGluZyBFeGVjU3RhcnQgZnJvbSAkVU5JVCIKRklMRT0kKHN5c3RlbWN0bCBjYXQgJFVOSVQgfCBoZWFkIC1uIDEpCkZJTEU9JHtGSUxFI1wjIH0KaWYgW1sgISAtZiAkRklMRSBdXTsgdGhlbgogIGRlYnVnICJGYWlsZWQgdG8gZmluZCByb290IGZpbGUgZm9yIHVuaXQgJFVOSVQgKCRGSUxFKSIKICBleGl0CmZpCmRlYnVnICJTZXJ2aWNlIGRlZmluaXRpb24gaXMgaW4gJEZJTEUiCkVYRUNTVEFSVD0kKHNlZCAtbiAtZSAnL15FeGVjU3RhcnQ9LipcXCQvLC9bXlxcXSQvIHsgcy9eRXhlY1N0YXJ0PS8vOyBwIH0nIC1lICcvXkV4ZWNTdGFydD0uKlteXFxdJC8geyBzL15FeGVjU3RhcnQ9Ly87IHAgfScgJEZJTEUpCgppZiBbWyAkRU5WRklMRSBdXTsgdGhlbgogIFZBUk5BTUU9JHtWQVJOQU1FOi1FWEVDU1RBUlR9CiAgZWNobyAiJHtWQVJOQU1FfT0ke0VYRUNTVEFSVH0iID4gJEVOVkZJTEUKZWxzZQogIGVjaG8gJEVYRUNTVEFSVApmaQo= mode: 493 path: /usr/local/bin/extractExecStart - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKbnNlbnRlciAtLW1vdW50PS9ydW4vY29udGFpbmVyLW1vdW50LW5hbWVzcGFjZS9tbnQgIiRAIgo= mode: 493 path: /usr/local/bin/nsenterCmns systemd: units: - contents: | [Unit] Description=Manages a mount namespace that both kubelet and crio can use to share their container-specific mounts [Service] Type=oneshot RemainAfterExit=yes RuntimeDirectory=container-mount-namespace Environment=RUNTIME_DIRECTORY=%t/container-mount-namespace Environment=BIND_POINT=%t/container-mount-namespace/mnt ExecStartPre=bash -c "findmnt ${RUNTIME_DIRECTORY} || mount --make-unbindable --bind ${RUNTIME_DIRECTORY} ${RUNTIME_DIRECTORY}" ExecStartPre=touch ${BIND_POINT} ExecStart=unshare --mount=${BIND_POINT} --propagation slave mount --make-rshared / ExecStop=umount -R ${RUNTIME_DIRECTORY} name: container-mount-namespace.service - dropins: - contents: | [Unit] Wants=container-mount-namespace.service After=container-mount-namespace.service [Service] ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART EnvironmentFile=-/%t/%N-execstart.env ExecStart= ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \ ${ORIG_EXECSTART}" name: 90-container-mount-namespace.conf name: crio.service - dropins: - contents: | [Unit] Wants=container-mount-namespace.service After=container-mount-namespace.service [Service] ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART EnvironmentFile=-/%t/%N-execstart.env ExecStart= ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \ ${ORIG_EXECSTART} --housekeeping-interval=30s" name: 90-container-mount-namespace.conf - contents: | [Service] Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s" Environment="OPENSHIFT_EVICTION_MONITORING_PERIOD_DURATION=30s" name: 30-kubelet-interval-tuning.conf name: kubelet.service
6.7.3. SCTP
Stream Control Transmission Protocol (SCTP) is a key protocol used in RAN applications. This MachineConfig
object adds the SCTP kernel module to the node to enable this protocol.
Recommended control plane node SCTP configuration (03-sctp-machine-config-master.yaml
)
# Automatically generated by extra-manifests-builder # Do not make changes directly. apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: load-sctp-module-master spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:, verification: {} filesystem: root mode: 420 path: /etc/modprobe.d/sctp-blacklist.conf - contents: source: data:text/plain;charset=utf-8,sctp filesystem: root mode: 420 path: /etc/modules-load.d/sctp-load.conf
Recommended worker node SCTP configuration (03-sctp-machine-config-worker.yaml
)
# Automatically generated by extra-manifests-builder # Do not make changes directly. apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: load-sctp-module-worker spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:, verification: {} filesystem: root mode: 420 path: /etc/modprobe.d/sctp-blacklist.conf - contents: source: data:text/plain;charset=utf-8,sctp filesystem: root mode: 420 path: /etc/modules-load.d/sctp-load.conf
6.7.4. Setting rcu_normal
The following MachineConfig
CR configures the system to set rcu_normal
to 1 after the system has finished startup. This improves kernel latency for vDU applications.
Recommended configuration for disabling rcu_expedited
after the node has finished startup (08-set-rcu-normal-master.yaml
)
# Automatically generated by extra-manifests-builder # Do not make changes directly. apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 08-set-rcu-normal-master spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKIwojIERpc2FibGUgcmN1X2V4cGVkaXRlZCBhZnRlciBub2RlIGhhcyBmaW5pc2hlZCBib290aW5nCiMKIyBUaGUgZGVmYXVsdHMgYmVsb3cgY2FuIGJlIG92ZXJyaWRkZW4gdmlhIGVudmlyb25tZW50IHZhcmlhYmxlcwojCgojIERlZmF1bHQgd2FpdCB0aW1lIGlzIDYwMHMgPSAxMG06Ck1BWElNVU1fV0FJVF9USU1FPSR7TUFYSU1VTV9XQUlUX1RJTUU6LTYwMH0KCiMgRGVmYXVsdCBzdGVhZHktc3RhdGUgdGhyZXNob2xkID0gMiUKIyBBbGxvd2VkIHZhbHVlczoKIyAgNCAgLSBhYnNvbHV0ZSBwb2QgY291bnQgKCsvLSkKIyAgNCUgLSBwZXJjZW50IGNoYW5nZSAoKy8tKQojICAtMSAtIGRpc2FibGUgdGhlIHN0ZWFkeS1zdGF0ZSBjaGVjawpTVEVBRFlfU1RBVEVfVEhSRVNIT0xEPSR7U1RFQURZX1NUQVRFX1RIUkVTSE9MRDotMiV9CgojIERlZmF1bHQgc3RlYWR5LXN0YXRlIHdpbmRvdyA9IDYwcwojIElmIHRoZSBydW5uaW5nIHBvZCBjb3VudCBzdGF5cyB3aXRoaW4gdGhlIGdpdmVuIHRocmVzaG9sZCBmb3IgdGhpcyB0aW1lCiMgcGVyaW9kLCByZXR1cm4gQ1BVIHV0aWxpemF0aW9uIHRvIG5vcm1hbCBiZWZvcmUgdGhlIG1heGltdW0gd2FpdCB0aW1lIGhhcwojIGV4cGlyZXMKU1RFQURZX1NUQVRFX1dJTkRPVz0ke1NURUFEWV9TVEFURV9XSU5ET1c6LTYwfQoKIyBEZWZhdWx0IHN0ZWFkeS1zdGF0ZSBhbGxvd3MgYW55IHBvZCBjb3VudCB0byBiZSAic3RlYWR5IHN0YXRlIgojIEluY3JlYXNpbmcgdGhpcyB3aWxsIHNraXAgYW55IHN0ZWFkeS1zdGF0ZSBjaGVja3MgdW50aWwgdGhlIGNvdW50IHJpc2VzIGFib3ZlCiMgdGhpcyBudW1iZXIgdG8gYXZvaWQgZmFsc2UgcG9zaXRpdmVzIGlmIHRoZXJlIGFyZSBzb21lIHBlcmlvZHMgd2hlcmUgdGhlCiMgY291bnQgZG9lc24ndCBpbmNyZWFzZSBidXQgd2Uga25vdyB3ZSBjYW4ndCBiZSBhdCBzdGVhZHktc3RhdGUgeWV0LgpTVEVBRFlfU1RBVEVfTUlOSU1VTT0ke1NURUFEWV9TVEFURV9NSU5JTVVNOi0wfQoKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoKd2l0aGluKCkgewogIGxvY2FsIGxhc3Q9JDEgY3VycmVudD0kMiB0aHJlc2hvbGQ9JDMKICBsb2NhbCBkZWx0YT0wIHBjaGFuZ2UKICBkZWx0YT0kKCggY3VycmVudCAtIGxhc3QgKSkKICBpZiBbWyAkY3VycmVudCAtZXEgJGxhc3QgXV07IHRoZW4KICAgIHBjaGFuZ2U9MAogIGVsaWYgW1sgJGxhc3QgLWVxIDAgXV07IHRoZW4KICAgIHBjaGFuZ2U9MTAwMDAwMAogIGVsc2UKICAgIHBjaGFuZ2U9JCgoICggIiRkZWx0YSIgKiAxMDApIC8gbGFzdCApKQogIGZpCiAgZWNobyAtbiAibGFzdDokbGFzdCBjdXJyZW50OiRjdXJyZW50IGRlbHRhOiRkZWx0YSBwY2hhbmdlOiR7cGNoYW5nZX0lOiAiCiAgbG9jYWwgYWJzb2x1dGUgbGltaXQKICBjYXNlICR0aHJlc2hvbGQgaW4KICAgIColKQogICAgICBhYnNvbHV0ZT0ke3BjaGFuZ2UjIy19ICMgYWJzb2x1dGUgdmFsdWUKICAgICAgbGltaXQ9JHt0aHJlc2hvbGQlJSV9CiAgICAgIDs7CiAgICAqKQogICAgICBhYnNvbHV0ZT0ke2RlbHRhIyMtfSAjIGFic29sdXRlIHZhbHVlCiAgICAgIGxpbWl0PSR0aHJlc2hvbGQKICAgICAgOzsKICBlc2FjCiAgaWYgW1sgJGFic29sdXRlIC1sZSAkbGltaXQgXV07IHRoZW4KICAgIGVjaG8gIndpdGhpbiAoKy8tKSR0aHJlc2hvbGQiCiAgICByZXR1cm4gMAogIGVsc2UKICAgIGVjaG8gIm91dHNpZGUgKCsvLSkkdGhyZXNob2xkIgogICAgcmV0dXJuIDEKICBmaQp9CgpzdGVhZHlzdGF0ZSgpIHsKICBsb2NhbCBsYXN0PSQxIGN1cnJlbnQ9JDIKICBpZiBbWyAkbGFzdCAtbHQgJFNURUFEWV9TVEFURV9NSU5JTVVNIF1dOyB0aGVuCiAgICBlY2hvICJsYXN0OiRsYXN0IGN1cnJlbnQ6JGN1cnJlbnQgV2FpdGluZyB0byByZWFjaCAkU1RFQURZX1NUQVRFX01JTklNVU0gYmVmb3JlIGNoZWNraW5nIGZvciBzdGVhZHktc3RhdGUiCiAgICByZXR1cm4gMQogIGZpCiAgd2l0aGluICIkbGFzdCIgIiRjdXJyZW50IiAiJFNURUFEWV9TVEFURV9USFJFU0hPTEQiCn0KCndhaXRGb3JSZWFkeSgpIHsKICBsb2dnZXIgIlJlY292ZXJ5OiBXYWl0aW5nICR7TUFYSU1VTV9XQUlUX1RJTUV9cyBmb3IgdGhlIGluaXRpYWxpemF0aW9uIHRvIGNvbXBsZXRlIgogIGxvY2FsIHQ9MCBzPTEwCiAgbG9jYWwgbGFzdENjb3VudD0wIGNjb3VudD0wIHN0ZWFkeVN0YXRlVGltZT0wCiAgd2hpbGUgW1sgJHQgLWx0ICRNQVhJTVVNX1dBSVRfVElNRSBdXTsgZG8KICAgIHNsZWVwICRzCiAgICAoKHQgKz0gcykpCiAgICAjIERldGVjdCBzdGVhZHktc3RhdGUgcG9kIGNvdW50CiAgICBjY291bnQ9JChjcmljdGwgcHMgMj4vZGV2L251bGwgfCB3YyAtbCkKICAgIGlmIFtbICRjY291bnQgLWd0IDAgXV0gJiYgc3RlYWR5c3RhdGUgIiRsYXN0Q2NvdW50IiAiJGNjb3VudCI7IHRoZW4KICAgICAgKChzdGVhZHlTdGF0ZVRpbWUgKz0gcykpCiAgICAgIGVjaG8gIlN0ZWFkeS1zdGF0ZSBmb3IgJHtzdGVhZHlTdGF0ZVRpbWV9cy8ke1NURUFEWV9TVEFURV9XSU5ET1d9cyIKICAgICAgaWYgW1sgJHN0ZWFkeVN0YXRlVGltZSAtZ2UgJFNURUFEWV9TVEFURV9XSU5ET1cgXV07IHRoZW4KICAgICAgICBsb2dnZXIgIlJlY292ZXJ5OiBTdGVhZHktc3RhdGUgKCsvLSAkU1RFQURZX1NUQVRFX1RIUkVTSE9MRCkgZm9yICR7U1RFQURZX1NUQVRFX1dJTkRPV31zOiBEb25lIgogICAgICAgIHJldHVybiAwCiAgICAgIGZpCiAgICBlbHNlCiAgICAgIGlmIFtbICRzdGVhZHlTdGF0ZVRpbWUgLWd0IDAgXV07IHRoZW4KICAgICAgICBlY2hvICJSZXNldHRpbmcgc3RlYWR5LXN0YXRlIHRpbWVyIgogICAgICAgIHN0ZWFkeVN0YXRlVGltZT0wCiAgICAgIGZpCiAgICBmaQogICAgbGFzdENjb3VudD0kY2NvdW50CiAgZG9uZQogIGxvZ2dlciAiUmVjb3Zlcnk6IFJlY292ZXJ5IENvbXBsZXRlIFRpbWVvdXQiCn0KCnNldFJjdU5vcm1hbCgpIHsKICBlY2hvICJTZXR0aW5nIHJjdV9ub3JtYWwgdG8gMSIKICBlY2hvIDEgPiAvc3lzL2tlcm5lbC9yY3Vfbm9ybWFsCn0KCm1haW4oKSB7CiAgd2FpdEZvclJlYWR5CiAgZWNobyAiV2FpdGluZyBmb3Igc3RlYWR5IHN0YXRlIHRvb2s6ICQoYXdrICd7cHJpbnQgaW50KCQxLzM2MDApImgiLCBpbnQoKCQxJTM2MDApLzYwKSJtIiwgaW50KCQxJTYwKSJzIn0nIC9wcm9jL3VwdGltZSkiCiAgc2V0UmN1Tm9ybWFsCn0KCmlmIFtbICIke0JBU0hfU09VUkNFWzBdfSIgPSAiJHswfSIgXV07IHRoZW4KICBtYWluICIke0B9IgogIGV4aXQgJD8KZmkK mode: 493 path: /usr/local/bin/set-rcu-normal.sh systemd: units: - contents: | [Unit] Description=Disable rcu_expedited after node has finished booting by setting rcu_normal to 1 [Service] Type=simple ExecStart=/usr/local/bin/set-rcu-normal.sh # Maximum wait time is 600s = 10m: Environment=MAXIMUM_WAIT_TIME=600 # Steady-state threshold = 2% # Allowed values: # 4 - absolute pod count (+/-) # 4% - percent change (+/-) # -1 - disable the steady-state check # Note: '%' must be escaped as '%%' in systemd unit files Environment=STEADY_STATE_THRESHOLD=2%% # Steady-state window = 120s # If the running pod count stays within the given threshold for this time # period, return CPU utilization to normal before the maximum wait time has # expires Environment=STEADY_STATE_WINDOW=120 # Steady-state minimum = 40 # Increasing this will skip any steady-state checks until the count rises above # this number to avoid false positives if there are some periods where the # count doesn't increase but we know we can't be at steady-state yet. Environment=STEADY_STATE_MINIMUM=40 [Install] WantedBy=multi-user.target enabled: true name: set-rcu-normal.service
6.7.5. Automatic kernel crash dumps with kdump
kdump
is a Linux kernel feature that creates a kernel crash dump when the kernel crashes. kdump
is enabled with the following MachineConfig
CRs.
Recommended MachineConfig
CR to remove ice driver from control plane kdump logs (05-kdump-config-master.yaml
)
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 05-kdump-config-master spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: kdump-remove-ice-module.service contents: | [Unit] Description=Remove ice module when doing kdump Before=kdump.service [Service] Type=oneshot RemainAfterExit=true ExecStart=/usr/local/bin/kdump-remove-ice-module.sh [Install] WantedBy=multi-user.target storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKIyBUaGlzIHNjcmlwdCByZW1vdmVzIHRoZSBpY2UgbW9kdWxlIGZyb20ga2R1bXAgdG8gcHJldmVudCBrZHVtcCBmYWlsdXJlcyBvbiBjZXJ0YWluIHNlcnZlcnMuCiMgVGhpcyBpcyBhIHRlbXBvcmFyeSB3b3JrYXJvdW5kIGZvciBSSEVMUExBTi0xMzgyMzYgYW5kIGNhbiBiZSByZW1vdmVkIHdoZW4gdGhhdCBpc3N1ZSBpcwojIGZpeGVkLgoKc2V0IC14CgpTRUQ9Ii91c3IvYmluL3NlZCIKR1JFUD0iL3Vzci9iaW4vZ3JlcCIKCiMgb3ZlcnJpZGUgZm9yIHRlc3RpbmcgcHVycG9zZXMKS0RVTVBfQ09ORj0iJHsxOi0vZXRjL3N5c2NvbmZpZy9rZHVtcH0iClJFTU9WRV9JQ0VfU1RSPSJtb2R1bGVfYmxhY2tsaXN0PWljZSIKCiMgZXhpdCBpZiBmaWxlIGRvZXNuJ3QgZXhpc3QKWyAhIC1mICR7S0RVTVBfQ09ORn0gXSAmJiBleGl0IDAKCiMgZXhpdCBpZiBmaWxlIGFscmVhZHkgdXBkYXRlZAoke0dSRVB9IC1GcSAke1JFTU9WRV9JQ0VfU1RSfSAke0tEVU1QX0NPTkZ9ICYmIGV4aXQgMAoKIyBUYXJnZXQgbGluZSBsb29rcyBzb21ldGhpbmcgbGlrZSB0aGlzOgojIEtEVU1QX0NPTU1BTkRMSU5FX0FQUEVORD0iaXJxcG9sbCBucl9jcHVzPTEgLi4uIGhlc3RfZGlzYWJsZSIKIyBVc2Ugc2VkIHRvIG1hdGNoIGV2ZXJ5dGhpbmcgYmV0d2VlbiB0aGUgcXVvdGVzIGFuZCBhcHBlbmQgdGhlIFJFTU9WRV9JQ0VfU1RSIHRvIGl0CiR7U0VEfSAtaSAncy9eS0RVTVBfQ09NTUFORExJTkVfQVBQRU5EPSJbXiJdKi8mICcke1JFTU9WRV9JQ0VfU1RSfScvJyAke0tEVU1QX0NPTkZ9IHx8IGV4aXQgMAo= mode: 448 path: /usr/local/bin/kdump-remove-ice-module.sh
Recommended control plane node kdump configuration (06-kdump-master.yaml
)
# Automatically generated by extra-manifests-builder # Do not make changes directly. apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 06-kdump-enable-master spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: kdump.service kernelArguments: - crashkernel=512M
Recommended MachineConfig
CR to remove ice driver from worker node kdump logs (05-kdump-config-worker.yaml
)
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 05-kdump-config-worker spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: kdump-remove-ice-module.service contents: | [Unit] Description=Remove ice module when doing kdump Before=kdump.service [Service] Type=oneshot RemainAfterExit=true ExecStart=/usr/local/bin/kdump-remove-ice-module.sh [Install] WantedBy=multi-user.target storage: files: - contents: source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKIyBUaGlzIHNjcmlwdCByZW1vdmVzIHRoZSBpY2UgbW9kdWxlIGZyb20ga2R1bXAgdG8gcHJldmVudCBrZHVtcCBmYWlsdXJlcyBvbiBjZXJ0YWluIHNlcnZlcnMuCiMgVGhpcyBpcyBhIHRlbXBvcmFyeSB3b3JrYXJvdW5kIGZvciBSSEVMUExBTi0xMzgyMzYgYW5kIGNhbiBiZSByZW1vdmVkIHdoZW4gdGhhdCBpc3N1ZSBpcwojIGZpeGVkLgoKc2V0IC14CgpTRUQ9Ii91c3IvYmluL3NlZCIKR1JFUD0iL3Vzci9iaW4vZ3JlcCIKCiMgb3ZlcnJpZGUgZm9yIHRlc3RpbmcgcHVycG9zZXMKS0RVTVBfQ09ORj0iJHsxOi0vZXRjL3N5c2NvbmZpZy9rZHVtcH0iClJFTU9WRV9JQ0VfU1RSPSJtb2R1bGVfYmxhY2tsaXN0PWljZSIKCiMgZXhpdCBpZiBmaWxlIGRvZXNuJ3QgZXhpc3QKWyAhIC1mICR7S0RVTVBfQ09ORn0gXSAmJiBleGl0IDAKCiMgZXhpdCBpZiBmaWxlIGFscmVhZHkgdXBkYXRlZAoke0dSRVB9IC1GcSAke1JFTU9WRV9JQ0VfU1RSfSAke0tEVU1QX0NPTkZ9ICYmIGV4aXQgMAoKIyBUYXJnZXQgbGluZSBsb29rcyBzb21ldGhpbmcgbGlrZSB0aGlzOgojIEtEVU1QX0NPTU1BTkRMSU5FX0FQUEVORD0iaXJxcG9sbCBucl9jcHVzPTEgLi4uIGhlc3RfZGlzYWJsZSIKIyBVc2Ugc2VkIHRvIG1hdGNoIGV2ZXJ5dGhpbmcgYmV0d2VlbiB0aGUgcXVvdGVzIGFuZCBhcHBlbmQgdGhlIFJFTU9WRV9JQ0VfU1RSIHRvIGl0CiR7U0VEfSAtaSAncy9eS0RVTVBfQ09NTUFORExJTkVfQVBQRU5EPSJbXiJdKi8mICcke1JFTU9WRV9JQ0VfU1RSfScvJyAke0tEVU1QX0NPTkZ9IHx8IGV4aXQgMAo= mode: 448 path: /usr/local/bin/kdump-remove-ice-module.sh
Recommended kdump worker node configuration (06-kdump-worker.yaml
)
# Automatically generated by extra-manifests-builder # Do not make changes directly. apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 06-kdump-enable-worker spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: kdump.service kernelArguments: - crashkernel=512M
6.7.6. Disable automatic CRI-O cache wipe
After an uncontrolled host shutdown or cluster reboot, CRI-O automatically deletes the entire CRI-O cache, causing all images to be pulled from the registry when the node reboots. This can result in unacceptably slow recovery times or recovery failures. To prevent this from happening in single-node OpenShift clusters that you install with GitOps ZTP, disable the CRI-O delete cache feature during cluster installation.
Recommended MachineConfig
CR to disable CRI-O cache wipe on control plane nodes (99-crio-disable-wipe-master.yaml
)
# Automatically generated by extra-manifests-builder # Do not make changes directly. apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 99-crio-disable-wipe-master spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo= mode: 420 path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml
Recommended MachineConfig
CR to disable CRI-O cache wipe on worker nodes (99-crio-disable-wipe-worker.yaml
)
# Automatically generated by extra-manifests-builder # Do not make changes directly. apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 99-crio-disable-wipe-worker spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo= mode: 420 path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml
6.7.7. Configuring crun as the default container runtime
The following ContainerRuntimeConfig
custom resources (CRs) configure crun as the default OCI container runtime for control plane and worker nodes. The crun container runtime is fast and lightweight and has a low memory footprint.
For optimal performance, enable crun for control plane and worker nodes in single-node OpenShift, three-node OpenShift, and standard clusters. To avoid the cluster rebooting when the CR is applied, apply the change as a GitOps ZTP additional Day 0 install-time manifest.
Recommended ContainerRuntimeConfig
CR for control plane nodes (enable-crun-master.yaml
)
apiVersion: machineconfiguration.openshift.io/v1 kind: ContainerRuntimeConfig metadata: name: enable-crun-master spec: machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/master: "" containerRuntimeConfig: defaultRuntime: crun
Recommended ContainerRuntimeConfig
CR for worker nodes (enable-crun-worker.yaml
)
apiVersion: machineconfiguration.openshift.io/v1 kind: ContainerRuntimeConfig metadata: name: enable-crun-worker spec: machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/worker: "" containerRuntimeConfig: defaultRuntime: crun
6.7.8. Enabling disk encryption with TPM and PCR protection
You can use the diskEncryption
field in the SiteConfig
custom resource (CR) to configure disk encryption with Trusted Platform Module (TPM) and Platform Configuration Registers (PCRs) protection.
Configuring the SiteConfig
CR enables disk encryption at the time of cluster installation.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You read the "About disk encryption with TPM and PCR protection" section.
Procedure
Configure the
spec.clusters.diskEncryption
field in theSiteConfig
CR:Recommended
SiteConfig
CR configuration to enable disk encryption with PCR protectionapiVersion: ran.openshift.io/v1 kind: SiteConfig metadata: name: "encryption-tpm2" namespace: "encryption-tpm2" spec: clusters: - clusterName: "encryption-tpm2" clusterImageSetNameRef: "openshift-v4.13.0" diskEncryption: type: "tpm2" 1 tpm2: pcrList: "1,7" 2 nodes: - hostName: "node1" role: master
Verification
Check that the disk encryption with TPM and PCR protection is enabled by running the following command:
$ clevis luks list -d <disk_path> 1
- 1
- Replace
<disk_path>
with the path to the disk. For example,/dev/sda4
.
Example output
1: tpm2 '{"hash":"sha256","key":"ecc","pcr_bank":"sha256","pcr_ids":"1,7"}'
Additional resources
6.8. Recommended postinstallation cluster configurations
When the cluster installation is complete, the ZTP pipeline applies the following custom resources (CRs) that are required to run DU workloads.
In GitOps ZTP v4.10 and earlier, you configure UEFI secure boot with a MachineConfig
CR. This is no longer required in GitOps ZTP v4.11 and later. In v4.11, you configure UEFI secure boot for single-node OpenShift clusters by updating the spec.clusters.nodes.bootMode
field in the SiteConfig
CR that you use to install the cluster. For more information, see Deploying a managed cluster with SiteConfig and GitOps ZTP.
6.8.1. Operators
Single-node OpenShift clusters that run DU workloads require the following Operators to be installed:
- Local Storage Operator
- Logging Operator
- PTP Operator
- SR-IOV Network Operator
You also need to configure a custom CatalogSource
CR, disable the default OperatorHub
configuration, and configure an ImageContentSourcePolicy
mirror registry that is accessible from the clusters that you install.
Recommended Storage Operator namespace and Operator group configuration (StorageNS.yaml
, StorageOperGroup.yaml
)
--- apiVersion: v1 kind: Namespace metadata: name: openshift-local-storage annotations: workload.openshift.io/allowed: management --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: openshift-local-storage namespace: openshift-local-storage annotations: {} spec: targetNamespaces: - openshift-local-storage
Recommended Cluster Logging Operator namespace and Operator group configuration (ClusterLogNS.yaml
, ClusterLogOperGroup.yaml
)
--- apiVersion: v1 kind: Namespace metadata: name: openshift-logging annotations: workload.openshift.io/allowed: management --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: cluster-logging namespace: openshift-logging annotations: {} spec: targetNamespaces: - openshift-logging
Recommended PTP Operator namespace and Operator group configuration (PtpSubscriptionNS.yaml
, PtpSubscriptionOperGroup.yaml
)
--- apiVersion: v1 kind: Namespace metadata: name: openshift-ptp annotations: workload.openshift.io/allowed: management labels: openshift.io/cluster-monitoring: "true" --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: ptp-operators namespace: openshift-ptp annotations: {} spec: targetNamespaces: - openshift-ptp
Recommended SR-IOV Operator namespace and Operator group configuration (SriovSubscriptionNS.yaml
, SriovSubscriptionOperGroup.yaml
)
--- apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator annotations: workload.openshift.io/allowed: management --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator annotations: {} spec: targetNamespaces: - openshift-sriov-network-operator
Recommended CatalogSource
configuration (DefaultCatsrc.yaml
)
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: default-cat-source namespace: openshift-marketplace annotations: target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}' spec: displayName: default-cat-source image: $imageUrl publisher: Red Hat sourceType: grpc updateStrategy: registryPoll: interval: 1h status: connectionState: lastObservedState: READY
Recommended ImageContentSourcePolicy
configuration (DisconnectedICSP.yaml
)
apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: name: disconnected-internal-icsp annotations: {} spec: # repositoryDigestMirrors: # - $mirrors
Recommended OperatorHub
configuration (OperatorHub.yaml
)
apiVersion: config.openshift.io/v1 kind: OperatorHub metadata: name: cluster annotations: {} spec: disableAllDefaultSources: true
6.8.2. Operator subscriptions
Single-node OpenShift clusters that run DU workloads require the following Subscription
CRs. The subscription provides the location to download the following Operators:
- Local Storage Operator
- Logging Operator
- PTP Operator
- SR-IOV Network Operator
- SRIOV-FEC Operator
For each Operator subscription, specify the channel to get the Operator from. The recommended channel is stable
.
You can specify Manual
or Automatic
updates. In Automatic
mode, the Operator automatically updates to the latest versions in the channel as they become available in the registry. In Manual
mode, new Operator versions are installed only when they are explicitly approved.
Use Manual
mode for subscriptions. This allows you to control the timing of Operator updates to fit within scheduled maintenance windows.
Recommended Local Storage Operator subscription (StorageSubscription.yaml
)
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: local-storage-operator namespace: openshift-local-storage annotations: {} spec: channel: "stable" name: local-storage-operator source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
Recommended SR-IOV Operator subscription (SriovSubscription.yaml
)
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subscription namespace: openshift-sriov-network-operator annotations: {} spec: channel: "stable" name: sriov-network-operator source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
Recommended PTP Operator subscription (PtpSubscription.yaml
)
--- apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ptp-operator-subscription namespace: openshift-ptp annotations: {} spec: channel: "stable" name: ptp-operator source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
Recommended Cluster Logging Operator subscription (ClusterLogSubscription.yaml
)
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: cluster-logging namespace: openshift-logging annotations: {} spec: channel: "stable-6.0" name: cluster-logging source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
6.8.3. Cluster logging and log forwarding
Single-node OpenShift clusters that run DU workloads require logging and log forwarding for debugging. The following custom resources (CRs) are required.
Recommended ClusterLogForwarder.yaml
apiVersion: "observability.openshift.io/v1" kind: ClusterLogForwarder metadata: name: instance namespace: openshift-logging annotations: {} spec: # outputs: $outputs # pipelines: $pipelines serviceAccount: name: logcollector #apiVersion: "observability.openshift.io/v1" #kind: ClusterLogForwarder #metadata: # name: instance # namespace: openshift-logging # spec: # outputs: # - type: "kafka" # name: kafka-open # # below url is an example # kafka: # url: tcp://10.46.55.190:9092/test # filters: # - name: test-labels # type: openshiftLabels # openshiftLabels: # label1: test1 # label2: test2 # label3: test3 # label4: test4 # pipelines: # - name: all-to-default # inputRefs: # - audit # - infrastructure # filterRefs: # - test-labels # outputRefs: # - kafka-open # serviceAccount: # name: logcollector
Set the spec.outputs.kafka.url
field to the URL of the Kafka server where the logs are forwarded to.
Recommended ClusterLogNS.yaml
--- apiVersion: v1 kind: Namespace metadata: name: openshift-logging annotations: workload.openshift.io/allowed: management
Recommended ClusterLogOperGroup.yaml
--- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: cluster-logging namespace: openshift-logging annotations: {} spec: targetNamespaces: - openshift-logging
Recommended ClusterLogServiceAccount.yaml
--- apiVersion: v1 kind: ServiceAccount metadata: name: logcollector namespace: openshift-logging annotations: {}
Recommended ClusterLogServiceAccountAuditBinding.yaml
--- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: logcollector-audit-logs-binding annotations: {} roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: collect-audit-logs subjects: - kind: ServiceAccount name: logcollector namespace: openshift-logging
Recommended ClusterLogServiceAccountInfrastructureBinding.yaml
--- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: logcollector-infrastructure-logs-binding annotations: {} roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: collect-infrastructure-logs subjects: - kind: ServiceAccount name: logcollector namespace: openshift-logging
Recommended ClusterLogSubscription.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: cluster-logging namespace: openshift-logging annotations: {} spec: channel: "stable-6.0" name: cluster-logging source: redhat-operators-disconnected sourceNamespace: openshift-marketplace installPlanApproval: Manual status: state: AtLatestKnown
6.8.4. Performance profile
Single-node OpenShift clusters that run DU workloads require a Node Tuning Operator performance profile to use real-time host capabilities and services.
In earlier versions of OpenShift Container Platform, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In OpenShift Container Platform 4.11 and later, this functionality is part of the Node Tuning Operator.
The following example PerformanceProfile
CR illustrates the required single-node OpenShift cluster configuration.
Recommended performance profile configuration (PerformanceProfile.yaml
)
apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: # if you change this name make sure the 'include' line in TunedPerformancePatch.yaml # matches this name: include=openshift-node-performance-${PerformanceProfile.metadata.name} # Also in file 'validatorCRs/informDuValidator.yaml': # name: 50-performance-${PerformanceProfile.metadata.name} name: openshift-node-performance-profile annotations: ran.openshift.io/reference-configuration: "ran-du.redhat.com" spec: additionalKernelArgs: - "rcupdate.rcu_normal_after_boot=0" - "efi=runtime" - "vfio_pci.enable_sriov=1" - "vfio_pci.disable_idle_d3=1" - "module_blacklist=irdma" cpu: isolated: $isolated reserved: $reserved hugepages: defaultHugepagesSize: $defaultHugepagesSize pages: - size: $size count: $count node: $node machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/$mcp: "" nodeSelector: node-role.kubernetes.io/$mcp: '' numa: topologyPolicy: "restricted" # To use the standard (non-realtime) kernel, set enabled to false realTimeKernel: enabled: true workloadHints: # WorkloadHints defines the set of upper level flags for different type of workloads. # See https://github.com/openshift/cluster-node-tuning-operator/blob/master/docs/performanceprofile/performance_profile.md#workloadhints # for detailed descriptions of each item. # The configuration below is set for a low latency, performance mode. realTime: true highPowerConsumption: false perPodPowerManagement: false
PerformanceProfile CR field | Description |
---|---|
|
Ensure that
|
|
|
| Set the isolated CPUs. Ensure all of the Hyper-Threading pairs match. Important The reserved and isolated CPU pools must not overlap and together must span all available cores. CPU cores that are not accounted for cause an undefined behaviour in the system. |
| Set the reserved CPUs. When workload partitioning is enabled, system processes, kernel threads, and system container threads are restricted to these CPUs. All CPUs that are not isolated should be reserved. |
|
|
|
Set |
|
Use |
6.8.5. Configuring cluster time synchronization
Run a one-time system time synchronization job for control plane or worker nodes.
Recommended one time time-sync for control plane nodes (99-sync-time-once-master.yaml
)
# Automatically generated by extra-manifests-builder # Do not make changes directly. apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 99-sync-time-once-master spec: config: ignition: version: 3.2.0 systemd: units: - contents: | [Unit] Description=Sync time once After=network-online.target Wants=network-online.target [Service] Type=oneshot TimeoutStartSec=300 ExecCondition=/bin/bash -c 'systemctl is-enabled chronyd.service --quiet && exit 1 || exit 0' ExecStart=/usr/sbin/chronyd -n -f /etc/chrony.conf -q RemainAfterExit=yes [Install] WantedBy=multi-user.target enabled: true name: sync-time-once.service
Recommended one time time-sync for worker nodes (99-sync-time-once-worker.yaml
)
# Automatically generated by extra-manifests-builder # Do not make changes directly. apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 99-sync-time-once-worker spec: config: ignition: version: 3.2.0 systemd: units: - contents: | [Unit] Description=Sync time once After=network-online.target Wants=network-online.target [Service] Type=oneshot TimeoutStartSec=300 ExecCondition=/bin/bash -c 'systemctl is-enabled chronyd.service --quiet && exit 1 || exit 0' ExecStart=/usr/sbin/chronyd -n -f /etc/chrony.conf -q RemainAfterExit=yes [Install] WantedBy=multi-user.target enabled: true name: sync-time-once.service
6.8.6. PTP
Single-node OpenShift clusters use Precision Time Protocol (PTP) for network time synchronization. The following example PtpConfig
CRs illustrate the required PTP configurations for ordinary clocks, boundary clocks, and grandmaster clocks. The exact configuration you apply will depend on the node hardware and specific use case.
Recommended PTP ordinary clock configuration (PtpConfigSlave.yaml
)
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: du-ptp-slave namespace: openshift-ptp annotations: {} spec: profile: - name: "slave" # The interface name is hardware-specific interface: $interface ptp4lOpts: "-2 -s" phc2sysOpts: "-a -r -n 24" ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" ptp4lConf: | [global] # # Default Data Set # twoStepFlag 1 slaveOnly 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 255 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type OC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: "slave" priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
Recommended boundary clock configuration (PtpConfigBoundary.yaml
)
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: boundary namespace: openshift-ptp annotations: {} spec: profile: - name: "boundary" ptp4lOpts: "-2" phc2sysOpts: "-a -r -n 24" ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" ptp4lConf: | # The interface name is hardware-specific [$iface_slave] masterOnly 0 [$iface_master_1] masterOnly 1 [$iface_master_2] masterOnly 1 [$iface_master_3] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 slaveOnly 0 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 248 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 135 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: "boundary" priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
Recommended PTP Westport Channel e810 grandmaster clock configuration (PtpConfigGmWpc.yaml
)
# The grandmaster profile is provided for testing only # It is not installed on production clusters apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: grandmaster namespace: openshift-ptp annotations: {} spec: profile: - name: "grandmaster" ptp4lOpts: "-2 --summary_interval -4" phc2sysOpts: -r -u 0 -m -w -N 8 -R 16 -s $iface_master -n 24 ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" plugins: e810: enableDefaultConfig: false settings: LocalMaxHoldoverOffSet: 1500 LocalHoldoverTimeout: 14400 MaxInSpecOffset: 100 pins: $e810_pins # "$iface_master": # "U.FL2": "0 2" # "U.FL1": "0 1" # "SMA2": "0 2" # "SMA1": "0 1" ublxCmds: - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1 - "-P" - "29.20" - "-z" - "CFG-HW-ANT_CFG_VOLTCTRL,1" reportOutput: false - args: #ubxtool -P 29.20 -e GPS - "-P" - "29.20" - "-e" - "GPS" reportOutput: false - args: #ubxtool -P 29.20 -d Galileo - "-P" - "29.20" - "-d" - "Galileo" reportOutput: false - args: #ubxtool -P 29.20 -d GLONASS - "-P" - "29.20" - "-d" - "GLONASS" reportOutput: false - args: #ubxtool -P 29.20 -d BeiDou - "-P" - "29.20" - "-d" - "BeiDou" reportOutput: false - args: #ubxtool -P 29.20 -d SBAS - "-P" - "29.20" - "-d" - "SBAS" reportOutput: false - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000 - "-P" - "29.20" - "-t" - "-w" - "5" - "-v" - "1" - "-e" - "SURVEYIN,600,50000" reportOutput: true - args: #ubxtool -P 29.20 -p MON-HW - "-P" - "29.20" - "-p" - "MON-HW" reportOutput: true - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248 - "-P" - "29.20" - "-p" - "CFG-MSG,1,38,248" reportOutput: true ts2phcOpts: " " ts2phcConf: | [nmea] ts2phc.master 1 [global] use_syslog 0 verbose 1 logging_level 7 ts2phc.pulsewidth 100000000 #cat /dev/GNSS to find available serial port #example value of gnss_serialport is /dev/ttyGNSS_1700_0 ts2phc.nmea_serialport $gnss_serialport leapfile /usr/share/zoneinfo/leap-seconds.list [$iface_master] ts2phc.extts_polarity rising ts2phc.extts_correction 0 ptp4lConf: | [$iface_master] masterOnly 1 [$iface_master_1] masterOnly 1 [$iface_master_2] masterOnly 1 [$iface_master_3] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 6 clockAccuracy 0x27 offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval 0 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval -4 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0x20 recommend: - profile: "grandmaster" priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
The following optional PtpOperatorConfig
CR configures PTP events reporting for the node.
Recommended PTP events configuration (PtpOperatorConfigForEvent.yaml
)
apiVersion: ptp.openshift.io/v1 kind: PtpOperatorConfig metadata: name: default namespace: openshift-ptp annotations: {} spec: daemonNodeSelector: node-role.kubernetes.io/$mcp: "" ptpEventConfig: apiVersion: $event_api_version enableEventPublisher: true transportHost: "http://ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043"
6.8.7. Extended Tuned profile
Single-node OpenShift clusters that run DU workloads require additional performance tuning configurations necessary for high-performance workloads. The following example Tuned
CR extends the Tuned
profile:
Recommended extended Tuned
profile configuration (TunedPerformancePatch.yaml
)
apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: performance-patch namespace: openshift-cluster-node-tuning-operator annotations: {} spec: profile: - name: performance-patch # Please note: # - The 'include' line must match the associated PerformanceProfile name, following below pattern # include=openshift-node-performance-${PerformanceProfile.metadata.name} # - When using the standard (non-realtime) kernel, remove the kernel.timer_migration override from # the [sysctl] section and remove the entire section if it is empty. data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-openshift-node-performance-profile [scheduler] group.ice-ptp=0:f:10:*:ice-ptp.* group.ice-gnss=0:f:10:*:ice-gnss.* group.ice-dplls=0:f:10:*:ice-dplls.* [service] service.stalld=start,enable service.chronyd=stop,disable recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "$mcp" priority: 19 profile: performance-patch
Tuned CR field | Description |
---|---|
|
|
6.8.8. SR-IOV
Single root I/O virtualization (SR-IOV) is commonly used to enable fronthaul and midhaul networks. The following YAML example configures SR-IOV for a single-node OpenShift cluster.
The configuration of the SriovNetwork
CR will vary depending on your specific network and infrastructure requirements.
Recommended SriovOperatorConfig
CR configuration (SriovOperatorConfig.yaml
)
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator annotations: {} spec: configDaemonNodeSelector: "node-role.kubernetes.io/$mcp": "" # Injector and OperatorWebhook pods can be disabled (set to "false") below # to reduce the number of management pods. It is recommended to start with the # webhook and injector pods enabled, and only disable them after verifying the # correctness of user manifests. # If the injector is disabled, containers using sr-iov resources must explicitly assign # them in the "requests"/"limits" section of the container spec, for example: # containers: # - name: my-sriov-workload-container # resources: # limits: # openshift.io/<resource_name>: "1" # requests: # openshift.io/<resource_name>: "1" enableInjector: false enableOperatorWebhook: false logLevel: 0
SriovOperatorConfig CR field | Description |
---|---|
|
Disable For example: containers: - name: my-sriov-workload-container resources: limits: openshift.io/<resource_name>: "1" requests: openshift.io/<resource_name>: "1" |
|
Disable |
Recommended SriovNetwork
configuration (SriovNetwork.yaml
)
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: "" namespace: openshift-sriov-network-operator annotations: {} spec: # resourceName: "" networkNamespace: openshift-sriov-network-operator # vlan: "" # spoofChk: "" # ipam: "" # linkState: "" # maxTxRate: "" # minTxRate: "" # vlanQoS: "" # trust: "" # capabilities: ""
SriovNetwork CR field | Description |
---|---|
|
Configure |
Recommended SriovNetworkNodePolicy
CR configuration (SriovNetworkNodePolicy.yaml
)
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: $name namespace: openshift-sriov-network-operator annotations: {} spec: # The attributes for Mellanox/Intel based NICs as below. # deviceType: netdevice/vfio-pci # isRdma: true/false deviceType: $deviceType isRdma: $isRdma nicSelector: # The exact physical function name must match the hardware used pfNames: [$pfNames] nodeSelector: node-role.kubernetes.io/$mcp: "" numVfs: $numVfs priority: $priority resourceName: $resourceName
SriovNetworkNodePolicy CR field | Description |
---|---|
|
Configure |
| Specifies the interface connected to the fronthaul network. |
| Specifies the number of VFs for the fronthaul network. |
| The exact name of physical function must match the hardware. |
Recommended SR-IOV kernel configurations (07-sriov-related-kernel-args-master.yaml
)
# Automatically generated by extra-manifests-builder # Do not make changes directly. apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 07-sriov-related-kernel-args-master spec: config: ignition: version: 3.2.0 kernelArguments: - intel_iommu=on - iommu=pt
6.8.9. Console Operator
Use the cluster capabilities feature to prevent the Console Operator from being installed. When the node is centrally managed it is not needed. Removing the Operator provides additional space and capacity for application workloads.
To disable the Console Operator during the installation of the managed cluster, set the following in the spec.clusters.0.installConfigOverrides
field of the SiteConfig
custom resource (CR):
installConfigOverrides: "{\"capabilities\":{\"baselineCapabilitySet\": \"None\" }}"
6.8.10. Alertmanager
Single-node OpenShift clusters that run DU workloads require reduced CPU resources consumed by the OpenShift Container Platform monitoring components. The following ConfigMap
custom resource (CR) disables Alertmanager.
Recommended cluster monitoring configuration (ReduceMonitoringFootprint.yaml
)
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring annotations: {} data: config.yaml: | alertmanagerMain: enabled: false telemeterClient: enabled: false prometheusK8s: retention: 24h
6.8.11. Operator Lifecycle Manager
Single-node OpenShift clusters that run distributed unit workloads require consistent access to CPU resources. Operator Lifecycle Manager (OLM) collects performance data from Operators at regular intervals, resulting in an increase in CPU utilisation. The following ConfigMap
custom resource (CR) disables the collection of Operator performance data by OLM.
Recommended cluster OLM configuration (ReduceOLMFootprint.yaml
)
apiVersion: v1 kind: ConfigMap metadata: name: collect-profiles-config namespace: openshift-operator-lifecycle-manager data: pprof-config.yaml: | disabled: True
6.8.12. LVM Storage
You can dynamically provision local storage on single-node OpenShift clusters with Logical Volume Manager (LVM) Storage.
The recommended storage solution for single-node OpenShift is the Local Storage Operator. Alternatively, you can use LVM Storage but it requires additional CPU resources to be allocated.
The following YAML example configures the storage of the node to be available to OpenShift Container Platform applications.
Recommended LVMCluster
configuration (StorageLVMCluster.yaml
)
apiVersion: lvm.topolvm.io/v1alpha1 kind: LVMCluster metadata: name: lvmcluster namespace: openshift-storage annotations: {} spec: {} #example: creating a vg1 volume group leveraging all available disks on the node # except the installation disk. # storage: # deviceClasses: # - name: vg1 # thinPoolConfig: # name: thin-pool-1 # sizePercent: 90 # overprovisionRatio: 10
LVMCluster CR field | Description |
---|---|
| Configure the disks used for LVM storage. If no disks are specified, the LVM Storage uses all the unused disks in the specified thin pool. |
6.8.13. Network diagnostics
Single-node OpenShift clusters that run DU workloads require less inter-pod network connectivity checks to reduce the additional load created by these pods. The following custom resource (CR) disables these checks.
Recommended network diagnostics configuration (DisableSnoNetworkDiag.yaml
)
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster annotations: {} spec: disableNetworkDiagnostics: true
Additional resources
Chapter 7. Validating single-node OpenShift cluster tuning for vDU application workloads
Before you can deploy virtual distributed unit (vDU) applications, you need to tune and configure the cluster host firmware and various other cluster configuration settings. Use the following information to validate the cluster configuration to support vDU workloads.
Additional resources
7.1. Recommended firmware configuration for vDU cluster hosts
Use the following table as the basis to configure the cluster host firmware for vDU applications running on OpenShift Container Platform 4.17.
The following table is a general recommendation for vDU cluster host firmware configuration. Exact firmware settings will depend on your requirements and specific hardware platform. Automatic setting of firmware is not handled by the zero touch provisioning pipeline.
Firmware setting | Configuration | Description |
---|---|---|
HyperTransport (HT) | Enabled | HyperTransport (HT) bus is a bus technology developed by AMD. HT provides a high-speed link between the components in the host memory and other system peripherals. |
UEFI | Enabled | Enable booting from UEFI for the vDU host. |
CPU Power and Performance Policy | Performance | Set CPU Power and Performance Policy to optimize the system for performance over energy efficiency. |
Uncore Frequency Scaling | Disabled | Disable Uncore Frequency Scaling to prevent the voltage and frequency of non-core parts of the CPU from being set independently. |
Uncore Frequency | Maximum | Sets the non-core parts of the CPU such as cache and memory controller to their maximum possible frequency of operation. |
Performance P-limit | Disabled | Disable Performance P-limit to prevent the Uncore frequency coordination of processors. |
Enhanced Intel® SpeedStep Tech | Enabled | Enable Enhanced Intel SpeedStep to allow the system to dynamically adjust processor voltage and core frequency that decreases power consumption and heat production in the host. |
Intel® Turbo Boost Technology | Enabled | Enable Turbo Boost Technology for Intel-based CPUs to automatically allow processor cores to run faster than the rated operating frequency if they are operating below power, current, and temperature specification limits. |
Intel Configurable TDP | Enabled | Enables Thermal Design Power (TDP) for the CPU. |
Configurable TDP Level | Level 2 | TDP level sets the CPU power consumption required for a particular performance rating. TDP level 2 sets the CPU to the most stable performance level at the cost of power consumption. |
Energy Efficient Turbo | Disabled | Disable Energy Efficient Turbo to prevent the processor from using an energy-efficiency based policy. |
Hardware P-States | Enabled or Disabled |
Enable OS-controlled P-States to allow power saving configurations. Disable |
Package C-State | C0/C1 state | Use C0 or C1 states to set the processor to a fully active state (C0) or to stop CPU internal clocks running in software (C1). |
C1E | Disabled | CPU Enhanced Halt (C1E) is a power saving feature in Intel chips. Disabling C1E prevents the operating system from sending a halt command to the CPU when inactive. |
Processor C6 | Disabled | C6 power-saving is a CPU feature that automatically disables idle CPU cores and cache. Disabling C6 improves system performance. |
Sub-NUMA Clustering | Disabled | Sub-NUMA clustering divides the processor cores, cache, and memory into multiple NUMA domains. Disabling this option can increase performance for latency-sensitive workloads. |
Enable global SR-IOV and VT-d settings in the firmware for the host. These settings are relevant to bare-metal environments.
Enable both C-states
and OS-controlled P-States
to allow per pod power management.
7.2. Recommended cluster configurations to run vDU applications
Clusters running virtualized distributed unit (vDU) applications require a highly tuned and optimized configuration. The following information describes the various elements that you require to support vDU workloads in OpenShift Container Platform 4.17 clusters.
7.2.1. Recommended cluster MachineConfig CRs for single-node OpenShift clusters
Check that the MachineConfig
custom resources (CRs) that you extract from the ztp-site-generate
container are applied in the cluster. The CRs can be found in the extracted out/source-crs/extra-manifest/
folder.
The following MachineConfig
CRs from the ztp-site-generate
container configure the cluster host:
MachineConfig CR | Description |
---|---|
| Configures the container mount namespace and kubelet configuration. |
|
Loads the SCTP kernel module. These |
| Configures kdump crash reporting for the cluster. |
| Configures SR-IOV kernel arguments in the cluster. |
|
Disables |
| Disables the automatic CRI-O cache wipe following cluster reboot. |
| Configures the one-time check and adjustment of the system clock by the Chrony service. |
|
Enables the |
| Enables cgroups v1 during cluster installation and when generating RHACM cluster policies. |
In OpenShift Container Platform 4.14 and later, you configure workload partitioning with the cpuPartitioningMode
field in the SiteConfig
CR.
7.2.2. Recommended cluster Operators
The following Operators are required for clusters running virtualized distributed unit (vDU) applications and are a part of the baseline reference configuration:
- Node Tuning Operator (NTO). NTO packages functionality that was previously delivered with the Performance Addon Operator, which is now a part of NTO.
- PTP Operator
- SR-IOV Network Operator
- Red Hat OpenShift Logging Operator
- Local Storage Operator
7.2.3. Recommended cluster kernel configuration
Always use the latest supported real-time kernel version in your cluster. Ensure that you apply the following configurations in the cluster:
Ensure that the following
additionalKernelArgs
are set in the cluster performance profile:apiVersion: performance.openshift.io/v2 kind: PerformanceProfile # ... spec: additionalKernelArgs: - "rcupdate.rcu_normal_after_boot=0" - "efi=runtime" - "vfio_pci.enable_sriov=1" - "vfio_pci.disable_idle_d3=1" - "module_blacklist=irdma" # ...
Optional: Set the CPU frequency under the
hardwareTuning
field:You can use hardware tuning to tune CPU frequencies for reserved and isolated core CPUs. For FlexRAN like applications, hardware vendors recommend that you run CPU frequencies below the default provided frequencies. It is highly recommended that, before setting any frequencies, you refer to the hardware vendor’s guidelines for maximum frequency settings for your processor generation. This example shows the frequencies for reserved and isolated CPUs for Sapphire Rapid hardware:
apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: openshift-node-performance-profile spec: cpu: isolated: "2-19,22-39" reserved: "0-1,20-21" hugepages: defaultHugepagesSize: 1G pages: - size: 1G count: 32 realTimeKernel: enabled: true hardwareTuning: isolatedCpuFreq: 2500000 reservedCpuFreq: 2800000
Ensure that the
performance-patch
profile in theTuned
CR configures the correct CPU isolation set that matches theisolated
CPU set in the relatedPerformanceProfile
CR, for example:apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: performance-patch namespace: openshift-cluster-node-tuning-operator annotations: ran.openshift.io/ztp-deploy-wave: "10" spec: profile: - name: performance-patch # The 'include' line must match the associated PerformanceProfile name, for example: # include=openshift-node-performance-${PerformanceProfile.metadata.name} # When using the standard (non-realtime) kernel, remove the kernel.timer_migration override from the [sysctl] section data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-openshift-node-performance-profile [scheduler] group.ice-ptp=0:f:10:*:ice-ptp.* group.ice-gnss=0:f:10:*:ice-gnss.* group.ice-dplls=0:f:10:*:ice-dplls.* [service] service.stalld=start,enable service.chronyd=stop,disable # ...
7.2.4. Checking the realtime kernel version
Always use the latest version of the realtime kernel in your OpenShift Container Platform clusters. If you are unsure about the kernel version that is in use in the cluster, you can compare the current realtime kernel version to the release version with the following procedure.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You are logged in as a user with
cluster-admin
privileges. -
You have installed
podman
.
Procedure
Run the following command to get the cluster version:
$ OCP_VERSION=$(oc get clusterversion version -o jsonpath='{.status.desired.version}{"\n"}')
Get the release image SHA number:
$ DTK_IMAGE=$(oc adm release info --image-for=driver-toolkit quay.io/openshift-release-dev/ocp-release:$OCP_VERSION-x86_64)
Run the release image container and extract the kernel version that is packaged with cluster’s current release:
$ podman run --rm $DTK_IMAGE rpm -qa | grep 'kernel-rt-core-' | sed 's#kernel-rt-core-##'
Example output
4.18.0-305.49.1.rt7.121.el8_4.x86_64
This is the default realtime kernel version that ships with the release.
NoteThe realtime kernel is denoted by the string
.rt
in the kernel version.
Verification
Check that the kernel version listed for the cluster’s current release matches actual realtime kernel that is running in the cluster. Run the following commands to check the running realtime kernel version:
Open a remote shell connection to the cluster node:
$ oc debug node/<node_name>
Check the realtime kernel version:
sh-4.4# uname -r
Example output
4.18.0-305.49.1.rt7.121.el8_4.x86_64
7.3. Checking that the recommended cluster configurations are applied
You can check that clusters are running the correct configuration. The following procedure describes how to check the various configurations that you require to deploy a DU application in OpenShift Container Platform 4.17 clusters.
Prerequisites
- You have deployed a cluster and tuned it for vDU workloads.
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges.
Procedure
Check that the default OperatorHub sources are disabled. Run the following command:
$ oc get operatorhub cluster -o yaml
Example output
spec: disableAllDefaultSources: true
Check that all required
CatalogSource
resources are annotated for workload partitioning (PreferredDuringScheduling
) by running the following command:$ oc get catalogsource -A -o jsonpath='{range .items[*]}{.metadata.name}{" -- "}{.metadata.annotations.target\.workload\.openshift\.io/management}{"\n"}{end}'
Example output
certified-operators -- {"effect": "PreferredDuringScheduling"} community-operators -- {"effect": "PreferredDuringScheduling"} ran-operators 1 redhat-marketplace -- {"effect": "PreferredDuringScheduling"} redhat-operators -- {"effect": "PreferredDuringScheduling"}
- 1
CatalogSource
resources that are not annotated are also returned. In this example, theran-operators
CatalogSource
resource is not annotated and does not have thePreferredDuringScheduling
annotation.
NoteIn a properly configured vDU cluster, only a single annotated catalog source is listed.
Check that all applicable OpenShift Container Platform Operator namespaces are annotated for workload partitioning. This includes all Operators installed with core OpenShift Container Platform and the set of additional Operators included in the reference DU tuning configuration. Run the following command:
$ oc get namespaces -A -o jsonpath='{range .items[*]}{.metadata.name}{" -- "}{.metadata.annotations.workload\.openshift\.io/allowed}{"\n"}{end}'
Example output
default -- openshift-apiserver -- management openshift-apiserver-operator -- management openshift-authentication -- management openshift-authentication-operator -- management
ImportantAdditional Operators must not be annotated for workload partitioning. In the output from the previous command, additional Operators should be listed without any value on the right side of the
--
separator.Check that the
ClusterLogging
configuration is correct. Run the following commands:Validate that the appropriate input and output logs are configured:
$ oc get -n openshift-logging ClusterLogForwarder instance -o yaml
Example output
apiVersion: logging.openshift.io/v1 kind: ClusterLogForwarder metadata: creationTimestamp: "2022-07-19T21:51:41Z" generation: 1 name: instance namespace: openshift-logging resourceVersion: "1030342" uid: 8c1a842d-80c5-447a-9150-40350bdf40f0 spec: inputs: - infrastructure: {} name: infra-logs outputs: - name: kafka-open type: kafka url: tcp://10.46.55.190:9092/test pipelines: - inputRefs: - audit name: audit-logs outputRefs: - kafka-open - inputRefs: - infrastructure name: infrastructure-logs outputRefs: - kafka-open ...
Check that the curation schedule is appropriate for your application:
$ oc get -n openshift-logging clusterloggings.logging.openshift.io instance -o yaml
Example output
apiVersion: logging.openshift.io/v1 kind: ClusterLogging metadata: creationTimestamp: "2022-07-07T18:22:56Z" generation: 1 name: instance namespace: openshift-logging resourceVersion: "235796" uid: ef67b9b8-0e65-4a10-88ff-ec06922ea796 spec: collection: logs: fluentd: {} type: fluentd curation: curator: schedule: 30 3 * * * type: curator managementState: Managed ...
Check that the web console is disabled (
managementState: Removed
) by running the following command:$ oc get consoles.operator.openshift.io cluster -o jsonpath="{ .spec.managementState }"
Example output
Removed
Check that
chronyd
is disabled on the cluster node by running the following commands:$ oc debug node/<node_name>
Check the status of
chronyd
on the node:sh-4.4# chroot /host
sh-4.4# systemctl status chronyd
Example output
● chronyd.service - NTP client/server Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:chronyd(8) man:chrony.conf(5)
Check that the PTP interface is successfully synchronized to the primary clock using a remote shell connection to the
linuxptp-daemon
container and the PTP Management Client (pmc
) tool:Set the
$PTP_POD_NAME
variable with the name of thelinuxptp-daemon
pod by running the following command:$ PTP_POD_NAME=$(oc get pods -n openshift-ptp -l app=linuxptp-daemon -o name)
Run the following command to check the sync status of the PTP device:
$ oc -n openshift-ptp rsh -c linuxptp-daemon-container ${PTP_POD_NAME} pmc -u -f /var/run/ptp4l.0.config -b 0 'GET PORT_DATA_SET'
Example output
sending: GET PORT_DATA_SET 3cecef.fffe.7a7020-1 seq 0 RESPONSE MANAGEMENT PORT_DATA_SET portIdentity 3cecef.fffe.7a7020-1 portState SLAVE logMinDelayReqInterval -4 peerMeanPathDelay 0 logAnnounceInterval 1 announceReceiptTimeout 3 logSyncInterval 0 delayMechanism 1 logMinPdelayReqInterval 0 versionNumber 2 3cecef.fffe.7a7020-2 seq 0 RESPONSE MANAGEMENT PORT_DATA_SET portIdentity 3cecef.fffe.7a7020-2 portState LISTENING logMinDelayReqInterval 0 peerMeanPathDelay 0 logAnnounceInterval 1 announceReceiptTimeout 3 logSyncInterval 0 delayMechanism 1 logMinPdelayReqInterval 0 versionNumber 2
Run the following
pmc
command to check the PTP clock status:$ oc -n openshift-ptp rsh -c linuxptp-daemon-container ${PTP_POD_NAME} pmc -u -f /var/run/ptp4l.0.config -b 0 'GET TIME_STATUS_NP'
Example output
sending: GET TIME_STATUS_NP 3cecef.fffe.7a7020-0 seq 0 RESPONSE MANAGEMENT TIME_STATUS_NP master_offset 10 1 ingress_time 1657275432697400530 cumulativeScaledRateOffset +0.000000000 scaledLastGmPhaseChange 0 gmTimeBaseIndicator 0 lastGmPhaseChange 0x0000'0000000000000000.0000 gmPresent true 2 gmIdentity 3c2c30.ffff.670e00
Check that the expected
master offset
value corresponding to the value in/var/run/ptp4l.0.config
is found in thelinuxptp-daemon-container
log:$ oc logs $PTP_POD_NAME -n openshift-ptp -c linuxptp-daemon-container
Example output
phc2sys[56020.341]: [ptp4l.1.config] CLOCK_REALTIME phc offset -1731092 s2 freq -1546242 delay 497 ptp4l[56020.390]: [ptp4l.1.config] master offset -2 s2 freq -5863 path delay 541 ptp4l[56020.390]: [ptp4l.0.config] master offset -8 s2 freq -10699 path delay 533
Check that the SR-IOV configuration is correct by running the following commands:
Check that the
disableDrain
value in theSriovOperatorConfig
resource is set totrue
:$ oc get sriovoperatorconfig -n openshift-sriov-network-operator default -o jsonpath="{.spec.disableDrain}{'\n'}"
Example output
true
Check that the
SriovNetworkNodeState
sync status isSucceeded
by running the following command:$ oc get SriovNetworkNodeStates -n openshift-sriov-network-operator -o jsonpath="{.items[*].status.syncStatus}{'\n'}"
Example output
Succeeded
Verify that the expected number and configuration of virtual functions (
Vfs
) under each interface configured for SR-IOV is present and correct in the.status.interfaces
field. For example:$ oc get SriovNetworkNodeStates -n openshift-sriov-network-operator -o yaml
Example output
apiVersion: v1 items: - apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodeState ... status: interfaces: ... - Vfs: - deviceID: 154c driver: vfio-pci pciAddress: 0000:3b:0a.0 vendor: "8086" vfID: 0 - deviceID: 154c driver: vfio-pci pciAddress: 0000:3b:0a.1 vendor: "8086" vfID: 1 - deviceID: 154c driver: vfio-pci pciAddress: 0000:3b:0a.2 vendor: "8086" vfID: 2 - deviceID: 154c driver: vfio-pci pciAddress: 0000:3b:0a.3 vendor: "8086" vfID: 3 - deviceID: 154c driver: vfio-pci pciAddress: 0000:3b:0a.4 vendor: "8086" vfID: 4 - deviceID: 154c driver: vfio-pci pciAddress: 0000:3b:0a.5 vendor: "8086" vfID: 5 - deviceID: 154c driver: vfio-pci pciAddress: 0000:3b:0a.6 vendor: "8086" vfID: 6 - deviceID: 154c driver: vfio-pci pciAddress: 0000:3b:0a.7 vendor: "8086" vfID: 7
Check that the cluster performance profile is correct. The
cpu
andhugepages
sections will vary depending on your hardware configuration. Run the following command:$ oc get PerformanceProfile openshift-node-performance-profile -o yaml
Example output
apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: creationTimestamp: "2022-07-19T21:51:31Z" finalizers: - foreground-deletion generation: 1 name: openshift-node-performance-profile resourceVersion: "33558" uid: 217958c0-9122-4c62-9d4d-fdc27c31118c spec: additionalKernelArgs: - idle=poll - rcupdate.rcu_normal_after_boot=0 - efi=runtime cpu: isolated: 2-51,54-103 reserved: 0-1,52-53 hugepages: defaultHugepagesSize: 1G pages: - count: 32 size: 1G machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" net: userLevelNetworking: true nodeSelector: node-role.kubernetes.io/master: "" numa: topologyPolicy: restricted realTimeKernel: enabled: true status: conditions: - lastHeartbeatTime: "2022-07-19T21:51:31Z" lastTransitionTime: "2022-07-19T21:51:31Z" status: "True" type: Available - lastHeartbeatTime: "2022-07-19T21:51:31Z" lastTransitionTime: "2022-07-19T21:51:31Z" status: "True" type: Upgradeable - lastHeartbeatTime: "2022-07-19T21:51:31Z" lastTransitionTime: "2022-07-19T21:51:31Z" status: "False" type: Progressing - lastHeartbeatTime: "2022-07-19T21:51:31Z" lastTransitionTime: "2022-07-19T21:51:31Z" status: "False" type: Degraded runtimeClass: performance-openshift-node-performance-profile tuned: openshift-cluster-node-tuning-operator/openshift-node-performance-openshift-node-performance-profile
NoteCPU settings are dependent on the number of cores available on the server and should align with workload partitioning settings.
hugepages
configuration is server and application dependent.Check that the
PerformanceProfile
was successfully applied to the cluster by running the following command:$ oc get performanceprofile openshift-node-performance-profile -o jsonpath="{range .status.conditions[*]}{ @.type }{' -- '}{@.status}{'\n'}{end}"
Example output
Available -- True Upgradeable -- True Progressing -- False Degraded -- False
Check the
Tuned
performance patch settings by running the following command:$ oc get tuneds.tuned.openshift.io -n openshift-cluster-node-tuning-operator performance-patch -o yaml
Example output
apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: creationTimestamp: "2022-07-18T10:33:52Z" generation: 1 name: performance-patch namespace: openshift-cluster-node-tuning-operator resourceVersion: "34024" uid: f9799811-f744-4179-bf00-32d4436c08fd spec: profile: - data: | [main] summary=Configuration changes profile inherited from performance created tuned include=openshift-node-performance-openshift-node-performance-profile [bootloader] cmdline_crash=nohz_full=2-23,26-47 1 [sysctl] kernel.timer_migration=1 [scheduler] group.ice-ptp=0:f:10:*:ice-ptp.* [service] service.stalld=start,enable service.chronyd=stop,disable name: performance-patch recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: master priority: 19 profile: performance-patch
- 1
- The cpu list in
cmdline=nohz_full=
will vary based on your hardware configuration.
Check that cluster networking diagnostics are disabled by running the following command:
$ oc get networks.operator.openshift.io cluster -o jsonpath='{.spec.disableNetworkDiagnostics}'
Example output
true
Check that the
Kubelet
housekeeping interval is tuned to slower rate. This is set in thecontainerMountNS
machine config. Run the following command:$ oc describe machineconfig container-mount-namespace-and-kubelet-conf-master | grep OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION
Example output
Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s"
Check that Grafana and
alertManagerMain
are disabled and that the Prometheus retention period is set to 24h by running the following command:$ oc get configmap cluster-monitoring-config -n openshift-monitoring -o jsonpath="{ .data.config\.yaml }"
Example output
grafana: enabled: false alertmanagerMain: enabled: false prometheusK8s: retention: 24h
Use the following commands to verify that Grafana and
alertManagerMain
routes are not found in the cluster:$ oc get route -n openshift-monitoring alertmanager-main
$ oc get route -n openshift-monitoring grafana
Both queries should return
Error from server (NotFound)
messages.
Check that there is a minimum of 4 CPUs allocated as
reserved
for each of thePerformanceProfile
,Tuned
performance-patch, workload partitioning, and kernel command line arguments by running the following command:$ oc get performanceprofile -o jsonpath="{ .items[0].spec.cpu.reserved }"
Example output
0-3
NoteDepending on your workload requirements, you might require additional reserved CPUs to be allocated.
Chapter 8. Advanced managed cluster configuration with SiteConfig resources
You can use SiteConfig
custom resources (CRs) to deploy custom functionality and configurations in your managed clusters at installation time.
8.1. Customizing extra installation manifests in the GitOps ZTP pipeline
You can define a set of extra manifests for inclusion in the installation phase of the GitOps Zero Touch Provisioning (ZTP) pipeline. These manifests are linked to the SiteConfig
custom resources (CRs) and are applied to the cluster during installation. Including MachineConfig
CRs at install time makes the installation process more efficient.
Prerequisites
- Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
Procedure
- Create a set of extra manifest CRs that the GitOps ZTP pipeline uses to customize the cluster installs.
In your custom
/siteconfig
directory, create a subdirectory/custom-manifest
for your extra manifests. The following example illustrates a sample/siteconfig
with/custom-manifest
folder:siteconfig ├── site1-sno-du.yaml ├── site2-standard-du.yaml ├── extra-manifest/ └── custom-manifest └── 01-example-machine-config.yaml
NoteThe subdirectory names
/custom-manifest
and/extra-manifest
used throughout are example names only. There is no requirement to use these names and no restriction on how you name these subdirectories. In this example/extra-manifest
refers to the Git subdirectory that stores the contents of/extra-manifest
from theztp-site-generate
container.-
Add your custom extra manifest CRs to the
siteconfig/custom-manifest
directory. In your
SiteConfig
CR, enter the directory name in theextraManifests.searchPaths
field, for example:clusters: - clusterName: "example-sno" networkType: "OVNKubernetes" extraManifests: searchPaths: - extra-manifest/ 1 - custom-manifest/ 2
-
Save the
SiteConfig
,/extra-manifest
, and/custom-manifest
CRs, and push them to the site configuration repo.
During cluster provisioning, the GitOps ZTP pipeline appends the CRs in the /custom-manifest
directory to the default set of extra manifests stored in extra-manifest/
.
As of version 4.14 extraManifestPath
is subject to a deprecation warning.
While extraManifestPath
is still supported, we recommend that you use extraManifests.searchPaths
. If you define extraManifests.searchPaths
in the SiteConfig
file, the GitOps ZTP pipeline does not fetch manifests from the ztp-site-generate
container during site installation.
If you define both extraManifestPath
and extraManifests.searchPaths
in the Siteconfig
CR, the setting defined for extraManifests.searchPaths
takes precedence.
It is strongly recommended that you extract the contents of /extra-manifest
from the ztp-site-generate
container and push it to the GIT repository.
8.2. Filtering custom resources using SiteConfig filters
By using filters, you can easily customize SiteConfig
custom resources (CRs) to include or exclude other CRs for use in the installation phase of the GitOps Zero Touch Provisioning (ZTP) pipeline.
You can specify an inclusionDefault
value of include
or exclude
for the SiteConfig
CR, along with a list of the specific extraManifest
RAN CRs that you want to include or exclude. Setting inclusionDefault
to include
makes the GitOps ZTP pipeline apply all the files in /source-crs/extra-manifest
during installation. Setting inclusionDefault
to exclude
does the opposite.
You can exclude individual CRs from the /source-crs/extra-manifest
folder that are otherwise included by default. The following example configures a custom single-node OpenShift SiteConfig
CR to exclude the /source-crs/extra-manifest/03-sctp-machine-config-worker.yaml
CR at installation time.
Some additional optional filtering scenarios are also described.
Prerequisites
- You configured the hub cluster for generating the required installation and policy CRs.
- You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
Procedure
To prevent the GitOps ZTP pipeline from applying the
03-sctp-machine-config-worker.yaml
CR file, apply the following YAML in theSiteConfig
CR:apiVersion: ran.openshift.io/v1 kind: SiteConfig metadata: name: "site1-sno-du" namespace: "site1-sno-du" spec: baseDomain: "example.com" pullSecretRef: name: "assisted-deployment-pull-secret" clusterImageSetNameRef: "openshift-4.17" sshPublicKey: "<ssh_public_key>" clusters: - clusterName: "site1-sno-du" extraManifests: filter: exclude: - 03-sctp-machine-config-worker.yaml
The GitOps ZTP pipeline skips the
03-sctp-machine-config-worker.yaml
CR during installation. All other CRs in/source-crs/extra-manifest
are applied.Save the
SiteConfig
CR and push the changes to the site configuration repository.The GitOps ZTP pipeline monitors and adjusts what CRs it applies based on the
SiteConfig
filter instructions.Optional: To prevent the GitOps ZTP pipeline from applying all the
/source-crs/extra-manifest
CRs during cluster installation, apply the following YAML in theSiteConfig
CR:- clusterName: "site1-sno-du" extraManifests: filter: inclusionDefault: exclude
Optional: To exclude all the
/source-crs/extra-manifest
RAN CRs and instead include a custom CR file during installation, edit the customSiteConfig
CR to set the custom manifests folder and theinclude
file, for example:clusters: - clusterName: "site1-sno-du" extraManifestPath: "<custom_manifest_folder>" 1 extraManifests: filter: inclusionDefault: exclude 2 include: - custom-sctp-machine-config-worker.yaml
The following example illustrates the custom folder structure:
siteconfig ├── site1-sno-du.yaml └── user-custom-manifest └── custom-sctp-machine-config-worker.yaml
8.3. Deleting a node by using the SiteConfig CR
By using a SiteConfig
custom resource (CR), you can delete and reprovision a node. This method is more efficient than manually deleting the node.
Prerequisites
- You have configured the hub cluster to generate the required installation and policy CRs.
- You have created a Git repository in which you can manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as the source repository for the Argo CD application.
Procedure
Update the
SiteConfig
CR to include thebmac.agent-install.openshift.io/remove-agent-and-node-on-delete=true
annotation and push the changes to the Git repository:apiVersion: ran.openshift.io/v1 kind: SiteConfig metadata: name: "cnfdf20" namespace: "cnfdf20" spec: clusters: nodes: - hostname: node6 role: "worker" crAnnotations: add: BareMetalHost: bmac.agent-install.openshift.io/remove-agent-and-node-on-delete: true # ...
Verify that the
BareMetalHost
object is annotated by running the following command:oc get bmh -n <managed-cluster-namespace> <bmh-object> -ojsonpath='{.metadata}' | jq -r '.annotations["bmac.agent-install.openshift.io/remove-agent-and-node-on-delete"]'
Example output
true
Suppress the generation of the
BareMetalHost
CR by updating theSiteConfig
CR to include thecrSuppression.BareMetalHost
annotation:apiVersion: ran.openshift.io/v1 kind: SiteConfig metadata: name: "cnfdf20" namespace: "cnfdf20" spec: clusters: - nodes: - hostName: node6 role: "worker" crSuppression: - BareMetalHost # ...
-
Push the changes to the Git repository and wait for deprovisioning to start. The status of the
BareMetalHost
CR should change todeprovisioning
. Wait for theBareMetalHost
to finish deprovisioning, and be fully deleted.
Verification
Verify that the
BareMetalHost
andAgent
CRs for the worker node have been deleted from the hub cluster by running the following commands:$ oc get bmh -n <cluster-ns>
$ oc get agent -n <cluster-ns>
Verify that the node record has been deleted from the spoke cluster by running the following command:
$ oc get nodes
NoteIf you are working with secrets, deleting a secret too early can cause an issue because ArgoCD needs the secret to complete resynchronization after deletion. Delete the secret only after the node cleanup, when the current ArgoCD synchronization is complete.
Next Steps
To reprovision a node, delete the changes previously added to the SiteConfig
, push the changes to the Git repository, and wait for the synchronization to complete. This regenerates the BareMetalHost
CR of the worker node and triggers the re-install of the node.
Chapter 9. Managing cluster polices with PolicyGenerator resources
9.1. Configuring managed cluster policies by using PolicyGenerator resources
Applied Policy
custom resources (CRs) configure the managed clusters that you provision. You can customize how Red Hat Advanced Cluster Management (RHACM) uses PolicyGenerator
CRs to generate the applied Policy
CRs.
Using PolicyGenerator resources with GitOps ZTP is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
For more information about PolicyGenerator
resources, see the RHACM Policy Generator documentation.
9.1.1. Comparing RHACM PolicyGenerator and PolicyGenTemplate resource patching
PolicyGenerator
custom resources (CRs) and PolicyGenTemplate
CRs can be used in GitOps ZTP to generate RHACM policies for managed clusters.
There are advantages to using PolicyGenerator
CRs over PolicyGenTemplate
CRs when it comes to patching OpenShift Container Platform resources with GitOps ZTP. Using the RHACM PolicyGenerator
API provides a generic way of patching resources which is not possible with PolicyGenTemplate
resources.
The PolicyGenerator
API is a part of the Open Cluster Management standard, while the PolicyGenTemplate
API is not. A comparison of PolicyGenerator
and PolicyGenTemplate
resource patching and placement strategies are described in the following table.
Using PolicyGenTemplate
CRs to manage and deploy polices to managed clusters will be deprecated in an upcoming OpenShift Container Platform release. Equivalent and improved functionality is available using Red Hat Advanced Cluster Management (RHACM) and PolicyGenerator
CRs.
For more information about PolicyGenerator
resources, see the RHACM Policy Generator documentation.
PolicyGenerator patching | PolicyGenTemplate patching |
---|---|
Uses Kustomize strategic merges for merging resources. For more information see Declarative Management of Kubernetes Objects Using Kustomize. | Works by replacing variables with their values as defined by the patch. This is less flexible than Kustomize merge strategies. |
Supports |
Does not support |
Relies only on patching, no embedded variable substitution is required. | Overwrites variable values defined in the patch. |
Does not support merging lists in merge patches. Replacing a list in a merge patch is supported. | Merging and replacing lists is supported in a limited fashion - you can only merge one object in the list. |
Does not currently support the OpenAPI specification for resource patching. This means that additional directives are required in the patch to merge content that does not follow a schema, for example, | Works by replacing fields and values with values as defined by the patch. |
Requires additional directives, for example, |
Substitutes fields and values defined in the source CR with values defined in the patch, for example |
Can patch the |
Can patch the |
9.1.2. About the PolicyGenerator CRD
The PolicyGenerator
custom resource definition (CRD) tells the PolicyGen
policy generator what custom resources (CRs) to include in the cluster configuration, how to combine the CRs into the generated policies, and what items in those CRs need to be updated with overlay content.
The following example shows a PolicyGenerator
CR (acm-common-du-ranGen.yaml
) extracted from the ztp-site-generate
reference container. The acm-common-du-ranGen.yaml
file defines two Red Hat Advanced Cluster Management (RHACM) policies. The polices manage a collection of configuration CRs, one for each unique value of policyName
in the CR. acm-common-du-ranGen.yaml
creates a single placement binding and a placement rule to bind the policies to clusters based on the labels listed in the policyDefaults.placement.labelSelector
section.
Example PolicyGenerator CR - acm-common-ranGen.yaml
apiVersion: policy.open-cluster-management.io/v1 kind: PolicyGenerator metadata: name: common-latest placementBindingDefaults: name: common-latest-placement-binding 1 policyDefaults: namespace: ztp-common placement: labelSelector: matchExpressions: - key: common operator: In values: - "true" - key: du-profile operator: In values: - latest remediationAction: inform severity: low namespaceSelector: exclude: - kube-* include: - '*' evaluationInterval: compliant: 10m noncompliant: 10s policies: - name: common-latest-config-policy policyAnnotations: ran.openshift.io/ztp-deploy-wave: "1" manifests: - path: source-crs/ReduceMonitoringFootprint.yaml - path: source-crs/DefaultCatsrc.yaml 2 patches: - metadata: name: redhat-operators-disconnected spec: displayName: disconnected-redhat-operators image: registry.example.com:5000/disconnected-redhat-operators/disconnected-redhat-operator-index:v4.9 - path: source-crs/DisconnectedICSP.yaml patches: - spec: repositoryDigestMirrors: - mirrors: - registry.example.com:5000 source: registry.redhat.io - name: common-latest-subscriptions-policy policyAnnotations: ran.openshift.io/ztp-deploy-wave: "2" manifests: 3 - path: source-crs/SriovSubscriptionNS.yaml - path: source-crs/SriovSubscriptionOperGroup.yaml - path: source-crs/SriovSubscription.yaml - path: source-crs/SriovOperatorStatus.yaml - path: source-crs/PtpSubscriptionNS.yaml - path: source-crs/PtpSubscriptionOperGroup.yaml - path: source-crs/PtpSubscription.yaml - path: source-crs/PtpOperatorStatus.yaml - path: source-crs/ClusterLogNS.yaml - path: source-crs/ClusterLogOperGroup.yaml - path: source-crs/ClusterLogSubscription.yaml - path: source-crs/ClusterLogOperatorStatus.yaml - path: source-crs/StorageNS.yaml - path: source-crs/StorageOperGroup.yaml - path: source-crs/StorageSubscription.yaml - path: source-crs/StorageOperatorStatus.yaml
A PolicyGenerator
CR can be constructed with any number of included CRs. Apply the following example CR in the hub cluster to generate a policy containing a single CR:
apiVersion: policy.open-cluster-management.io/v1 kind: PolicyGenerator metadata: name: group-du-sno placementBindingDefaults: name: group-du-sno-placement-binding policyDefaults: namespace: ztp-group placement: labelSelector: matchExpressions: - key: group-du-sno operator: Exists remediationAction: inform severity: low namespaceSelector: exclude: - kube-* include: - '*' evaluationInterval: compliant: 10m noncompliant: 10s policies: - name: group-du-sno-config-policy policyAnnotations: ran.openshift.io/ztp-deploy-wave: '10' manifests: - path: source-crs/PtpConfigSlave-MCP-master.yaml patches: - metadata: null name: du-ptp-slave namespace: openshift-ptp annotations: ran.openshift.io/ztp-deploy-wave: '10' spec: profile: - name: slave interface: $interface ptp4lOpts: '-2 -s' phc2sysOpts: '-a -r -n 24' ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: 'true' ptp4lConf: | [global] # # Default Data Set # twoStepFlag 1 slaveOnly 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 255 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type OC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: slave priority: 4 match: - nodeLabel: node-role.kubernetes.io/master
Using the source file PtpConfigSlave.yaml
as an example, the file defines a PtpConfig
CR. The generated policy for the PtpConfigSlave
example is named group-du-sno-config-policy
. The PtpConfig
CR defined in the generated group-du-sno-config-policy
is named du-ptp-slave
. The spec
defined in PtpConfigSlave.yaml
is placed under du-ptp-slave
along with the other spec
items defined under the source file.
The following example shows the group-du-sno-config-policy
CR:
--- apiVersion: policy.open-cluster-management.io/v1 kind: PolicyGenerator metadata: name: du-upgrade placementBindingDefaults: name: du-upgrade-placement-binding policyDefaults: namespace: ztp-group-du-sno placement: labelSelector: matchExpressions: - key: group-du-sno operator: Exists remediationAction: inform severity: low namespaceSelector: exclude: - kube-* include: - '*' evaluationInterval: compliant: 10m noncompliant: 10s policies: - name: du-upgrade-operator-catsrc-policy policyAnnotations: ran.openshift.io/ztp-deploy-wave: "1" manifests: - path: source-crs/DefaultCatsrc.yaml patches: - metadata: name: redhat-operators spec: displayName: Red Hat Operators Catalog image: registry.example.com:5000/olm/redhat-operators:v4.14 updateStrategy: registryPoll: interval: 1h status: connectionState: lastObservedState: READY
9.1.3. Recommendations when customizing PolicyGenerator CRs
Consider the following best practices when customizing site configuration PolicyGenerator
custom resources (CRs):
-
Use as few policies as are necessary. Using fewer policies requires less resources. Each additional policy creates increased CPU load for the hub cluster and the deployed managed cluster. CRs are combined into policies based on the
policyName
field in thePolicyGenerator
CR. CRs in the samePolicyGenerator
which have the same value forpolicyName
are managed under a single policy. -
In disconnected environments, use a single catalog source for all Operators by configuring the registry as a single index containing all Operators. Each additional
CatalogSource
CR on the managed clusters increases CPU usage. -
MachineConfig
CRs should be included asextraManifests
in theSiteConfig
CR so that they are applied during installation. This can reduce the overall time taken until the cluster is ready to deploy applications. -
PolicyGenerator
CRs should override the channel field to explicitly identify the desired version. This ensures that changes in the source CR during upgrades does not update the generated subscription.
Additional resources
- For recommendations about scaling clusters with RHACM, see Performance and scalability.
When managing large numbers of spoke clusters on the hub cluster, minimize the number of policies to reduce resource consumption.
Grouping multiple configuration CRs into a single or limited number of policies is one way to reduce the overall number of policies on the hub cluster. When using the common, group, and site hierarchy of policies for managing site configuration, it is especially important to combine site-specific configuration into a single policy.
9.1.4. PolicyGenerator CRs for RAN deployments
Use PolicyGenerator
custom resources (CRs) to customize the configuration applied to the cluster by using the GitOps Zero Touch Provisioning (ZTP) pipeline. The PolicyGenerator
CR allows you to generate one or more policies to manage the set of configuration CRs on your fleet of clusters. The PolicyGenerator
CR identifies the set of managed CRs, bundles them into policies, builds the policy wrapping around those CRs, and associates the policies with clusters by using label binding rules.
The reference configuration, obtained from the GitOps ZTP container, is designed to provide a set of critical features and node tuning settings that ensure the cluster can support the stringent performance and resource utilization constraints typical of RAN (Radio Access Network) Distributed Unit (DU) applications. Changes or omissions from the baseline configuration can affect feature availability, performance, and resource utilization. Use the reference PolicyGenerator
CRs as the basis to create a hierarchy of configuration files tailored to your specific site requirements.
The baseline PolicyGenerator
CRs that are defined for RAN DU cluster configuration can be extracted from the GitOps ZTP ztp-site-generate
container. See "Preparing the GitOps ZTP site configuration repository" for further details.
The PolicyGenerator
CRs can be found in the ./out/argocd/example/acmpolicygenerator/
folder. The reference architecture has common, group, and site-specific configuration CRs. Each PolicyGenerator
CR refers to other CRs that can be found in the ./out/source-crs
folder.
The PolicyGenerator
CRs relevant to RAN cluster configuration are described below. Variants are provided for the group PolicyGenerator
CRs to account for differences in single-node, three-node compact, and standard cluster configurations. Similarly, site-specific configuration variants are provided for single-node clusters and multi-node (compact or standard) clusters. Use the group and site-specific configuration variants that are relevant for your deployment.
PolicyGenerator CR | Description |
---|---|
| Contains a set of CRs that get applied to multi-node clusters. These CRs configure SR-IOV features typical for RAN installations. |
| Contains a set of CRs that get applied to single-node OpenShift clusters. These CRs configure SR-IOV features typical for RAN installations. |
| Contains a set of common RAN policy configuration that get applied to multi-node clusters. |
| Contains a set of common RAN CRs that get applied to all clusters. These CRs subscribe to a set of operators providing cluster features typical for RAN as well as baseline cluster tuning. |
| Contains the RAN policies for three-node clusters only. |
| Contains the RAN policies for single-node clusters only. |
| Contains the RAN policies for standard three control-plane clusters. |
|
|
|
|
|
|
Additional resources
9.1.5. Customizing a managed cluster with PolicyGenerator CRs
Use the following procedure to customize the policies that get applied to the managed cluster that you provision using the GitOps Zero Touch Provisioning (ZTP) pipeline.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You configured the hub cluster for generating the required installation and policy CRs.
- You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
Procedure
Create a
PolicyGenerator
CR for site-specific configuration CRs.-
Choose the appropriate example for your CR from the
out/argocd/example/acmpolicygenerator/
folder, for example,acm-example-sno-site.yaml
oracm-example-multinode-site.yaml
. Change the
policyDefaults.placement.labelSelector
field in the example file to match the site-specific label included in theSiteConfig
CR. In the exampleSiteConfig
file, the site-specific label issites: example-sno
.NoteEnsure that the labels defined in your
PolicyGenerator
policyDefaults.placement.labelSelector
field correspond to the labels that are defined in the related managed clustersSiteConfig
CR.- Change the content in the example file to match the desired configurati
-
Choose the appropriate example for your CR from the