Chapter 19. Clusters at the network far edge
19.1. Challenges of the network far edge Copy linkLink copied to clipboard!
Edge computing presents complex challenges when managing many sites in geographically displaced locations. Use zero touch provisioning (ZTP) and GitOps to provision and manage sites at the far edge of the network.
19.1.1. Overcoming the challenges of the network far edge Copy linkLink copied to clipboard!
Today, service providers want to deploy their infrastructure at the edge of the network. This presents significant challenges:
- How do you handle deployments of many edge sites in parallel?
- What happens when you need to deploy sites in disconnected environments?
- How do you manage the lifecycle of large fleets of clusters?
Zero touch provisioning (ZTP) and GitOps meets these challenges by allowing you to provision remote edge sites at scale with declarative site definitions and configurations for bare-metal equipment. Template or overlay configurations install OpenShift Container Platform features that are required for CNF workloads. The full lifecycle of installation and upgrades is handled through the ZTP pipeline.
ZTP uses GitOps for infrastructure deployments. With GitOps, you use declarative YAML files and other defined patterns stored in Git repositories. Red Hat Advanced Cluster Management (RHACM) uses your Git repositories to drive the deployment of your infrastructure.
GitOps provides traceability, role-based access control (RBAC), and a single source of truth for the desired state of each site. Scalability issues are addressed by Git methodologies and event driven operations through webhooks.
You start the ZTP workflow by creating declarative site definition and configuration custom resources (CRs) that the ZTP pipeline delivers to the edge nodes.
The following diagram shows how ZTP works within the far edge framework.
19.1.2. Using ZTP to provision clusters at the network far edge Copy linkLink copied to clipboard!
Red Hat Advanced Cluster Management (RHACM) manages clusters in a hub-and-spoke architecture, where a single hub cluster manages many spoke clusters. Hub clusters running RHACM provision and deploy the managed clusters by using zero touch provisioning (ZTP) and the assisted service that is deployed when you install RHACM.
The assisted service handles provisioning of OpenShift Container Platform on single node clusters, three-node clusters, or standard clusters running on bare metal.
A high-level overview of using ZTP to provision and maintain bare-metal hosts with OpenShift Container Platform is as follows:
- A hub cluster running RHACM manages an OpenShift image registry that mirrors the OpenShift Container Platform release images. RHACM uses the OpenShift image registry to provision the managed clusters.
- You manage the bare-metal hosts in a YAML format inventory file, versioned in a Git repository.
- You make the hosts ready for provisioning as managed clusters, and use RHACM and the assisted service to install the bare-metal hosts on site.
Installing and deploying the clusters is a two-stage process, involving an initial installation phase, and a subsequent configuration phase. The following diagram illustrates this workflow:
19.1.3. Installing managed clusters with SiteConfig resources and RHACM Copy linkLink copied to clipboard!
GitOps ZTP uses SiteConfig
custom resources (CRs) in a Git repository to manage the processes that install OpenShift Container Platform clusters. The SiteConfig
CR contains cluster-specific parameters required for installation. It has options for applying select configuration CRs during installation including user defined extra manifests.
The ZTP GitOps plugin processes SiteConfig
CRs to generate a collection of CRs on the hub cluster. This triggers the assisted service in Red Hat Advanced Cluster Management (RHACM) to install OpenShift Container Platform on the bare-metal host. You can find installation status and error messages in these CRs on the hub cluster.
You can provision single clusters manually or in batches with ZTP:
- Provisioning a single cluster
-
Create a single
SiteConfig
CR and related installation and configuration CRs for the cluster, and apply them in the hub cluster to begin cluster provisioning. This is a good way to test your CRs before deploying on a larger scale. - Provisioning many clusters
-
Install managed clusters in batches of up to 400 by defining
SiteConfig
and related CRs in a Git repository. ArgoCD uses theSiteConfig
CRs to deploy the sites. The RHACM policy generator creates the manifests and applies them to the hub cluster. This starts the cluster provisioning process.
19.1.4. Configuring managed clusters with policies and PolicyGenTemplate resources Copy linkLink copied to clipboard!
Zero touch provisioning (ZTP) uses Red Hat Advanced Cluster Management (RHACM) to configure clusters by using a policy-based governance approach to applying the configuration.
The policy generator or PolicyGen
is a plugin for the GitOps Operator that enables the creation of RHACM policies from a concise template. The tool can combine multiple CRs into a single policy, and you can generate multiple policies that apply to various subsets of clusters in your fleet.
For scalability and to reduce the complexity of managing configurations across the fleet of clusters, use configuration CRs with as much commonality as possible.
- Where possible, apply configuration CRs using a fleet-wide common policy.
- The next preference is to create logical groupings of clusters to manage as much of the remaining configurations as possible under a group policy.
- When a configuration is unique to an individual site, use RHACM templating on the hub cluster to inject the site-specific data into a common or group policy. Alternatively, apply an individual site policy for the site.
The following diagram shows how the policy generator interacts with GitOps and RHACM in the configuration phase of cluster deployment.
For large fleets of clusters, it is typical for there to be a high-level of consistency in the configuration of those clusters.
The following recommended structuring of policies combines configuration CRs to meet several goals:
- Describe common configurations once and apply to the fleet.
- Minimize the number of maintained and managed policies.
- Support flexibility in common configurations for cluster variants.
Policy category | Description |
---|---|
Common |
A policy that exists in the common category is applied to all clusters in the fleet. Use common |
Groups |
A policy that exists in the groups category is applied to a group of clusters in the fleet. Use group |
Sites | A policy that exists in the sites category is applied to a specific cluster site. Any cluster can have its own specific policies maintained. |
19.2. Preparing the hub cluster for ZTP Copy linkLink copied to clipboard!
To use RHACM in a disconnected environment, create a mirror registry that mirrors the OpenShift Container Platform release images and Operator Lifecycle Manager (OLM) catalog that contains the required Operator images. OLM manages, installs, and upgrades Operators and their dependencies in the cluster. You can also use a disconnected mirror host to serve the RHCOS ISO and RootFS disk images that are used to provision the bare-metal hosts.
19.2.1. Telco RAN 4.12 validated solution software versions Copy linkLink copied to clipboard!
The Red Hat Telco Radio Access Network (RAN) version 4.12 solution has been validated using the following Red Hat software products.
Product | Software version |
---|---|
Hub cluster OpenShift Container Platform version | 4.12 |
GitOps ZTP plugin | 4.10, 4.11, or 4.12 |
Red Hat Advanced Cluster Management (RHACM) | 2.6, 2.7 |
Red Hat OpenShift GitOps | 1.9, 1.10 |
Topology Aware Lifecycle Manager (TALM) | 4.10, 4.11, or 4.12 |
19.2.2. Installing GitOps ZTP in a disconnected environment Copy linkLink copied to clipboard!
Use Red Hat Advanced Cluster Management (RHACM), Red Hat OpenShift GitOps, and Topology Aware Lifecycle Manager (TALM) on the hub cluster in the disconnected environment to manage the deployment of multiple managed clusters.
Prerequisites
-
You have installed the OpenShift Container Platform CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. You have configured a disconnected mirror registry for use in the cluster.
NoteThe disconnected mirror registry that you create must contain a version of TALM backup and pre-cache images that matches the version of TALM running in the hub cluster. The spoke cluster must be able to resolve these images in the disconnected mirror registry.
Procedure
- Install RHACM in the hub cluster. See Installing RHACM in a disconnected environment.
- Install GitOps and TALM in the hub cluster.
19.2.3. Adding RHCOS ISO and RootFS images to the disconnected mirror host Copy linkLink copied to clipboard!
Before you begin installing clusters in the disconnected environment with Red Hat Advanced Cluster Management (RHACM), you must first host Red Hat Enterprise Linux CoreOS (RHCOS) images for it to use. Use a disconnected mirror to host the RHCOS images.
Prerequisites
- Deploy and configure an HTTP server to host the RHCOS image resources on the network. You must be able to access the HTTP server from your computer, and from the machines that you create.
The RHCOS images might not change with every release of OpenShift Container Platform. You must download images with the highest version that is less than or equal to the version that you install. Use the image versions that match your OpenShift Container Platform version if they are available. You require ISO and RootFS images to install RHCOS on the hosts. RHCOS QCOW2 images are not supported for this installation type.
Procedure
- Log in to the mirror host.
Obtain the RHCOS ISO and RootFS images from mirror.openshift.com, for example:
Export the required image names and OpenShift Container Platform version as environment variables:
export ISO_IMAGE_NAME=<iso_image_name>
$ export ISO_IMAGE_NAME=<iso_image_name>
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow export ROOTFS_IMAGE_NAME=<rootfs_image_name>
$ export ROOTFS_IMAGE_NAME=<rootfs_image_name>
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow export OCP_VERSION=<ocp_version>
$ export OCP_VERSION=<ocp_version>
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow Download the required images:
sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.12/${OCP_VERSION}/${ISO_IMAGE_NAME} -O /var/www/html/${ISO_IMAGE_NAME}
$ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.12/${OCP_VERSION}/${ISO_IMAGE_NAME} -O /var/www/html/${ISO_IMAGE_NAME}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.12/${OCP_VERSION}/${ROOTFS_IMAGE_NAME} -O /var/www/html/${ROOTFS_IMAGE_NAME}
$ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.12/${OCP_VERSION}/${ROOTFS_IMAGE_NAME} -O /var/www/html/${ROOTFS_IMAGE_NAME}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification steps
Verify that the images downloaded successfully and are being served on the disconnected mirror host, for example:
wget http://$(hostname)/${ISO_IMAGE_NAME}
$ wget http://$(hostname)/${ISO_IMAGE_NAME}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Saving to: rhcos-4.12.1-x86_64-live.x86_64.iso rhcos-4.12.1-x86_64-live.x86_64.iso- 11%[====> ] 10.01M 4.71MB/s
Saving to: rhcos-4.12.1-x86_64-live.x86_64.iso rhcos-4.12.1-x86_64-live.x86_64.iso- 11%[====> ] 10.01M 4.71MB/s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.2.4. Enabling the assisted service Copy linkLink copied to clipboard!
Red Hat Advanced Cluster Management (RHACM) uses the assisted service to deploy OpenShift Container Platform clusters. The assisted service is deployed automatically when you enable the MultiClusterHub Operator on Red Hat Advanced Cluster Management (RHACM). After that, you need to configure the Provisioning
resource to watch all namespaces and to update the AgentServiceConfig
custom resource (CR) with references to the ISO and RootFS images that are hosted on the mirror registry HTTP server.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have RHACM with MultiClusterHub enabled.
Procedure
-
Enable the
Provisioning
resource to watch all namespaces and configure mirrors for disconnected environments. For more information, see Enabling the Central Infrastructure Management service. Update the
AgentServiceConfig
CR by running the following command:oc edit AgentServiceConfig
$ oc edit AgentServiceConfig
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the following entry to the
items.spec.osImages
field in the CR:- cpuArchitecture: x86_64 openshiftVersion: "4.12" rootFSUrl: https://<host>/<path>/rhcos-live-rootfs.x86_64.img url: https://<host>/<path>/rhcos-live.x86_64.iso
- cpuArchitecture: x86_64 openshiftVersion: "4.12" rootFSUrl: https://<host>/<path>/rhcos-live-rootfs.x86_64.img url: https://<host>/<path>/rhcos-live.x86_64.iso
Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
- <host>
- Is the fully qualified domain name (FQDN) for the target mirror registry HTTP server.
- <path>
- Is the path to the image on the target mirror registry.
Save and quit the editor to apply the changes.
19.2.5. Configuring the hub cluster to use a disconnected mirror registry Copy linkLink copied to clipboard!
You can configure the hub cluster to use a disconnected mirror registry for a disconnected environment.
Prerequisites
- You have a disconnected hub cluster installation with Red Hat Advanced Cluster Management (RHACM) 2.7 installed.
-
You have hosted the
rootfs
andiso
images on an HTTP server. See the Additional resources section for guidance about Mirroring the OpenShift Container Platform image repository.
If you enable TLS for the HTTP server, you must confirm the root certificate is signed by an authority trusted by the client and verify the trusted certificate chain between your OpenShift Container Platform hub and managed clusters and the HTTP server. Using a server configured with an untrusted certificate prevents the images from being downloaded to the image creation service. Using untrusted HTTPS servers is not supported.
Procedure
Create a
ConfigMap
containing the mirror registry config:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
ConfigMap
namespace must be set tomulticluster-engine
. - 2
- The mirror registry’s certificate used when creating the mirror registry.
- 3
- The configuration file for the mirror registry. The mirror registry configuration adds mirror information to
/etc/containers/registries.conf
in the Discovery image. The mirror information is stored in theimageContentSources
section of theinstall-config.yaml
file when passed to the installation program. The Assisted Service pod running on the HUB cluster fetches the container images from the configured mirror registry. - 4
- The URL of the mirror registry. You must use the URL from the
imageContentSources
section by running theoc adm release mirror
command when you configure the mirror registry. For more information, see the Mirroring the OpenShift Container Platform image repository section.
This updates
mirrorRegistryRef
in theAgentServiceConfig
custom resource, as shown below:Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
A valid NTP server is required during cluster installation. Ensure that a suitable NTP server is available and can be reached from the installed clusters through the disconnected network.
19.2.6. Configuring the hub cluster to use unauthenticated registries Copy linkLink copied to clipboard!
You can configure the hub cluster to use unauthenticated registries. Unauthenticated registries does not require authentication to access and download images.
Prerequisites
- You have installed and configured a hub cluster and installed Red Hat Advanced Cluster Management (RHACM) on the hub cluster.
- You have installed the OpenShift Container Platform CLI (oc).
-
You have logged in as a user with
cluster-admin
privileges. - You have configured an unauthenticated registry for use with the hub cluster.
Procedure
Update the
AgentServiceConfig
custom resource (CR) by running the following command:oc edit AgentServiceConfig agent
$ oc edit AgentServiceConfig agent
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the
unauthenticatedRegistries
field in the CR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Unauthenticated registries are listed under
spec.unauthenticatedRegistries
in theAgentServiceConfig
resource. Any registry on this list is not required to have an entry in the pull secret used for the spoke cluster installation.assisted-service
validates the pull secret by making sure it contains the authentication information for every image registry used for installation.
Mirror registries are automatically added to the ignore list and do not need to be added under spec.unauthenticatedRegistries
. Specifying the PUBLIC_CONTAINER_REGISTRIES
environment variable in the ConfigMap
overrides the default values with the specified value. The PUBLIC_CONTAINER_REGISTRIES
defaults are quay.io and registry.svc.ci.openshift.org.
Verification
Verify that you can access the newly added registry from the hub cluster by running the following commands:
Open a debug shell prompt to the hub cluster:
oc debug node/<node_name>
$ oc debug node/<node_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Test access to the unauthenticated registry by running the following command:
podman login -u kubeadmin -p $(oc whoami -t) <unauthenticated_registry>
sh-4.4# podman login -u kubeadmin -p $(oc whoami -t) <unauthenticated_registry>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
- <unauthenticated_registry>
-
Is the new registry, for example,
unauthenticated-image-registry.openshift-image-registry.svc:5000
.
Example output
Login Succeeded!
Login Succeeded!
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.2.7. Configuring the hub cluster with ArgoCD Copy linkLink copied to clipboard!
You can configure the hub cluster with a set of ArgoCD applications that generate the required installation and policy custom resources (CRs) for each site with GitOps zero touch provisioning (ZTP).
Red Hat Advanced Cluster Management (RHACM) uses SiteConfig
CRs to generate the Day 1 managed cluster installation CRs for ArgoCD. Each ArgoCD application can manage a maximum of 300 SiteConfig
CRs.
Prerequisites
- You have a OpenShift Container Platform hub cluster with Red Hat Advanced Cluster Management (RHACM) and Red Hat OpenShift GitOps installed.
-
You have extracted the reference deployment from the ZTP GitOps plugin container as described in the "Preparing the GitOps ZTP site configuration repository" section. Extracting the reference deployment creates the
out/argocd/deployment
directory referenced in the following procedure.
Procedure
Prepare the ArgoCD pipeline configuration:
- Create a Git repository with the directory structure similar to the example directory. For more information, see "Preparing the GitOps ZTP site configuration repository".
Configure access to the repository using the ArgoCD UI. Under Settings configure the following:
-
Repositories - Add the connection information. The URL must end in
.git
, for example,https://repo.example.com/repo.git
and credentials. - Certificates - Add the public certificate for the repository, if needed.
-
Repositories - Add the connection information. The URL must end in
Modify the two ArgoCD applications,
out/argocd/deployment/clusters-app.yaml
andout/argocd/deployment/policies-app.yaml
, based on your Git repository:-
Update the URL to point to the Git repository. The URL ends with
.git
, for example,https://repo.example.com/repo.git
. -
The
targetRevision
indicates which Git repository branch to monitor. -
path
specifies the path to theSiteConfig
andPolicyGenTemplate
CRs, respectively.
-
Update the URL to point to the Git repository. The URL ends with
To install the ZTP GitOps plugin you must patch the ArgoCD instance in the hub cluster by using the patch file previously extracted into the
out/argocd/deployment/
directory. Run the following command:oc patch argocd openshift-gitops \ -n openshift-gitops --type=merge \ --patch-file out/argocd/deployment/argocd-openshift-gitops-patch.json
$ oc patch argocd openshift-gitops \ -n openshift-gitops --type=merge \ --patch-file out/argocd/deployment/argocd-openshift-gitops-patch.json
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteFor a disconnected environment, amend the
out/argocd/deployment/argocd-openshift-gitops-patch.json
file with theztp-site-generate
image mirrored in your local registry. Run the following command:oc patch argocd openshift-gitops -n openshift-gitops --type='json' \ -p='[{"op": "replace", "path": "/spec/repo/initContainers/0/image", \ "value": "<local_registry>/<ztp_site_generate_image_ref>"}]'
$ oc patch argocd openshift-gitops -n openshift-gitops --type='json' \ -p='[{"op": "replace", "path": "/spec/repo/initContainers/0/image", \ "value": "<local_registry>/<ztp_site_generate_image_ref>"}]'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
- <local_registry>
-
Is the URL of the disconnected registry, for example,
my.local.registry:5000
- <ztp-site-generate-image-ref>
-
Is the path to the mirrored
ztp-site-generate
image in the local registry, for exampleopenshift4-ztp-site-generate:custom
.
In RHACM 2.7 and later, the multicluster engine enables the
cluster-proxy-addon
feature by default. To disable this feature, apply the following patch to disable and remove the relevant hub cluster and managed cluster pods that are responsible for this add-on.oc patch multiclusterengines.multicluster.openshift.io multiclusterengine --type=merge --patch-file out/argocd/deployment/disable-cluster-proxy-addon.json
$ oc patch multiclusterengines.multicluster.openshift.io multiclusterengine --type=merge --patch-file out/argocd/deployment/disable-cluster-proxy-addon.json
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the pipeline configuration to your hub cluster by using the following command:
oc apply -k out/argocd/deployment
$ oc apply -k out/argocd/deployment
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.2.8. Preparing the GitOps ZTP site configuration repository Copy linkLink copied to clipboard!
Before you can use the ZTP GitOps pipeline, you need to prepare the Git repository to host the site configuration data.
Prerequisites
- You have configured the hub cluster GitOps applications for generating the required installation and policy custom resources (CRs).
- You have deployed the managed clusters using zero touch provisioning (ZTP).
Procedure
-
Create a directory structure with separate paths for the
SiteConfig
andPolicyGenTemplate
CRs. Export the
argocd
directory from theztp-site-generate
container image using the following commands:podman pull registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12
$ podman pull registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12
Copy to Clipboard Copied! Toggle word wrap Toggle overflow mkdir -p ./out
$ mkdir -p ./out
Copy to Clipboard Copied! Toggle word wrap Toggle overflow podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12 extract /home/ztp --tar | tar x -C ./out
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12 extract /home/ztp --tar | tar x -C ./out
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the
out
directory contains the following subdirectories:-
out/extra-manifest
contains the source CR files thatSiteConfig
uses to generate extra manifestconfigMap
. -
out/source-crs
contains the source CR files thatPolicyGenTemplate
uses to generate the Red Hat Advanced Cluster Management (RHACM) policies. -
out/argocd/deployment
contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure. -
out/argocd/example
contains the examples forSiteConfig
andPolicyGenTemplate
files that represent the recommended configuration.
-
The directory structure under out/argocd/example
serves as a reference for the structure and content of your Git repository. The example includes SiteConfig
and PolicyGenTemplate
reference CRs for single-node, three-node, and standard clusters. Remove references to cluster types that you are not using. The following example describes a set of CRs for a network of single-node clusters:
Keep SiteConfig
and PolicyGenTemplate
CRs in separate directories. Both the SiteConfig
and PolicyGenTemplate
directories must contain a kustomization.yaml
file that explicitly includes the files in that directory.
This directory structure and the kustomization.yaml
files must be committed and pushed to your Git repository. The initial push to Git should include the kustomization.yaml
files. The SiteConfig
(example-sno.yaml
) and PolicyGenTemplate
(common-ranGen.yaml
, group-du-sno*.yaml
, and example-sno-site.yaml
) files can be omitted and pushed at a later time as required when deploying a site.
The KlusterletAddonConfigOverride.yaml
file is only required if one or more SiteConfig
CRs which make reference to it are committed and pushed to Git. See example-sno.yaml
for an example of how this is used.
19.3. Installing managed clusters with RHACM and SiteConfig resources Copy linkLink copied to clipboard!
You can provision OpenShift Container Platform clusters at scale with Red Hat Advanced Cluster Management (RHACM) using the assisted service and the GitOps plugin policy generator with core-reduction technology enabled. The zero touch priovisioning (ZTP) pipeline performs the cluster installations. ZTP can be used in a disconnected environment.
19.3.1. GitOps ZTP and Topology Aware Lifecycle Manager Copy linkLink copied to clipboard!
GitOps zero touch provisioning (ZTP) generates installation and configuration CRs from manifests stored in Git. These artifacts are applied to a centralized hub cluster where Red Hat Advanced Cluster Management (RHACM), the assisted service, and the Topology Aware Lifecycle Manager (TALM) use the CRs to install and configure the managed cluster. The configuration phase of the ZTP pipeline uses the TALM to orchestrate the application of the configuration CRs to the cluster. There are several key integration points between GitOps ZTP and the TALM.
- Inform policies
-
By default, GitOps ZTP creates all policies with a remediation action of
inform
. These policies cause RHACM to report on compliance status of clusters relevant to the policies but does not apply the desired configuration. During the ZTP process, after OpenShift installation, the TALM steps through the createdinform
policies and enforces them on the target managed cluster(s). This applies the configuration to the managed cluster. Outside of the ZTP phase of the cluster lifecycle, this allows you to change policies without the risk of immediately rolling those changes out to affected managed clusters. You can control the timing and the set of remediated clusters by using TALM. - Automatic creation of ClusterGroupUpgrade CRs
To automate the initial configuration of newly deployed clusters, TALM monitors the state of all
ManagedCluster
CRs on the hub cluster. AnyManagedCluster
CR that does not have aztp-done
label applied, including newly createdManagedCluster
CRs, causes the TALM to automatically create aClusterGroupUpgrade
CR with the following characteristics:-
The
ClusterGroupUpgrade
CR is created and enabled in theztp-install
namespace. -
ClusterGroupUpgrade
CR has the same name as theManagedCluster
CR. -
The cluster selector includes only the cluster associated with that
ManagedCluster
CR. -
The set of managed policies includes all policies that RHACM has bound to the cluster at the time the
ClusterGroupUpgrade
is created. - Pre-caching is disabled.
- Timeout set to 4 hours (240 minutes).
The automatic creation of an enabled
ClusterGroupUpgrade
ensures that initial zero-touch deployment of clusters proceeds without the need for user intervention. Additionally, the automatic creation of aClusterGroupUpgrade
CR for anyManagedCluster
without theztp-done
label allows a failed ZTP installation to be restarted by simply deleting theClusterGroupUpgrade
CR for the cluster.-
The
- Waves
Each policy generated from a
PolicyGenTemplate
CR includes aztp-deploy-wave
annotation. This annotation is based on the same annotation from each CR which is included in that policy. The wave annotation is used to order the policies in the auto-generatedClusterGroupUpgrade
CR. The wave annotation is not used other than for the auto-generatedClusterGroupUpgrade
CR.NoteAll CRs in the same policy must have the same setting for the
ztp-deploy-wave
annotation. The default value of this annotation for each CR can be overridden in thePolicyGenTemplate
. The wave annotation in the source CR is used for determining and setting the policy wave annotation. This annotation is removed from each built CR which is included in the generated policy at runtime.The TALM applies the configuration policies in the order specified by the wave annotations. The TALM waits for each policy to be compliant before moving to the next policy. It is important to ensure that the wave annotation for each CR takes into account any prerequisites for those CRs to be applied to the cluster. For example, an Operator must be installed before or concurrently with the configuration for the Operator. Similarly, the
CatalogSource
for an Operator must be installed in a wave before or concurrently with the Operator Subscription. The default wave value for each CR takes these prerequisites into account.Multiple CRs and policies can share the same wave number. Having fewer policies can result in faster deployments and lower CPU usage. It is a best practice to group many CRs into relatively few waves.
To check the default wave value in each source CR, run the following command against the out/source-crs
directory that is extracted from the ztp-site-generate
container image:
grep -r "ztp-deploy-wave" out/source-crs
$ grep -r "ztp-deploy-wave" out/source-crs
- Phase labels
The
ClusterGroupUpgrade
CR is automatically created and includes directives to annotate theManagedCluster
CR with labels at the start and end of the ZTP process.When ZTP configuration postinstallation commences, the
ManagedCluster
has theztp-running
label applied. When all policies are remediated to the cluster and are fully compliant, these directives cause the TALM to remove theztp-running
label and apply theztp-done
label.For deployments that make use of the
informDuValidator
policy, theztp-done
label is applied when the cluster is fully ready for deployment of applications. This includes all reconciliation and resulting effects of the ZTP applied configuration CRs. Theztp-done
label affects automaticClusterGroupUpgrade
CR creation by TALM. Do not manipulate this label after the initial ZTP installation of the cluster.- Linked CRs
-
The automatically created
ClusterGroupUpgrade
CR has the owner reference set as theManagedCluster
from which it was derived. This reference ensures that deleting theManagedCluster
CR causes the instance of theClusterGroupUpgrade
to be deleted along with any supporting resources.
19.3.2. Overview of deploying managed clusters with ZTP Copy linkLink copied to clipboard!
Red Hat Advanced Cluster Management (RHACM) uses zero touch provisioning (ZTP) to deploy single-node OpenShift Container Platform clusters, three-node clusters, and standard clusters. You manage site configuration data as OpenShift Container Platform custom resources (CRs) in a Git repository. ZTP uses a declarative GitOps approach for a develop once, deploy anywhere model to deploy the managed clusters.
The deployment of the clusters includes:
- Installing the host operating system (RHCOS) on a blank server
- Deploying OpenShift Container Platform
- Creating cluster policies and site subscriptions
- Making the necessary network configurations to the server operating system
- Deploying profile Operators and performing any needed software-related configuration, such as performance profile, PTP, and SR-IOV
Overview of the managed site installation process
After you apply the managed site custom resources (CRs) on the hub cluster, the following actions happen automatically:
- A Discovery image ISO file is generated and booted on the target host.
- When the ISO file successfully boots on the target host it reports the host hardware information to RHACM.
- After all hosts are discovered, OpenShift Container Platform is installed.
-
When OpenShift Container Platform finishes installing, the hub installs the
klusterlet
service on the target cluster. - The requested add-on services are installed on the target cluster.
The Discovery image ISO process is complete when the Agent
CR for the managed cluster is created on the hub cluster.
The target bare-metal host must meet the networking, firmware, and hardware requirements listed in Recommended single-node OpenShift cluster configuration for vDU application workloads.
19.3.3. Creating the managed bare-metal host secrets Copy linkLink copied to clipboard!
Add the required Secret
custom resources (CRs) for the managed bare-metal host to the hub cluster. You need a secret for the ZTP pipeline to access the Baseboard Management Controller (BMC) and a secret for the assisted installer service to pull cluster installation images from the registry.
The secrets are referenced from the SiteConfig
CR by name. The namespace must match the SiteConfig
namespace.
Procedure
Create a YAML secret file containing credentials for the host Baseboard Management Controller (BMC) and a pull secret required for installing OpenShift and all add-on cluster Operators:
Save the following YAML as the file
example-sno-secret.yaml
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
-
Add the relative path to
example-sno-secret.yaml
to thekustomization.yaml
file that you use to install the cluster.
19.3.4. Configuring Discovery ISO kernel arguments for installations using GitOps ZTP Copy linkLink copied to clipboard!
The GitOps ZTP workflow uses the Discovery ISO as part of the OpenShift Container Platform installation process on managed bare-metal hosts. You can edit the InfraEnv
resource to specify kernel arguments for the Discovery ISO. This is useful for cluster installations with specific environmental requirements. For example, configure the rd.net.timeout.carrier
kernel argument for the Discovery ISO to facilitate static networking for the cluster or to receive a DHCP address before downloading the root file system during installation.
In OpenShift Container Platform 4.12, you can only add kernel arguments. You can not replace or delete kernel arguments.
Prerequisites
- You have installed the OpenShift CLI (oc).
- You have logged in to the hub cluster as a user with cluster-admin privileges.
Procedure
Create the
InfraEnv
CR and edit thespec.kernelArguments
specification to configure kernel arguments.Save the following YAML in an
InfraEnv-example.yaml
file:NoteThe
InfraEnv
CR in this example uses template syntax such as{{ .Cluster.ClusterName }}
that is populated based on values in theSiteConfig
CR. TheSiteConfig
CR automatically populates values for these templates during deployment. Do not edit the templates manually.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Commit the
InfraEnv-example.yaml
CR to the same location in your Git repository that has theSiteConfig
CR and push your changes. The following example shows a sample Git repository structure:~/example-ztp/install └── site-install ├── siteconfig-example.yaml ├── InfraEnv-example.yaml ...
~/example-ztp/install └── site-install ├── siteconfig-example.yaml ├── InfraEnv-example.yaml ...
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the
spec.clusters.crTemplates
specification in theSiteConfig
CR to reference theInfraEnv-example.yaml
CR in your Git repository:clusters: crTemplates: InfraEnv: "InfraEnv-example.yaml"
clusters: crTemplates: InfraEnv: "InfraEnv-example.yaml"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When you are ready to deploy your cluster by committing and pushing the
SiteConfig
CR, the build pipeline uses the customInfraEnv-example
CR in your Git repository to configure the infrastructure environment, including the custom kernel arguments.
Verification
To verify that the kernel arguments are applied, after the Discovery image verifies that OpenShift Container Platform is ready for installation, you can SSH to the target host before the installation process begins. At that point, you can view the kernel arguments for the Discovery ISO in the /proc/cmdline
file.
Begin an SSH session with the target host:
ssh -i /path/to/privatekey core@<host_name>
$ ssh -i /path/to/privatekey core@<host_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View the system’s kernel arguments by using the following command:
cat /proc/cmdline
$ cat /proc/cmdline
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.3.5. Deploying a managed cluster with SiteConfig and ZTP Copy linkLink copied to clipboard!
Use the following procedure to create a SiteConfig
custom resource (CR) and related files and initiate the zero touch provisioning (ZTP) cluster deployment.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You configured the hub cluster for generating the required installation and policy CRs.
You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and you must configure it as a source repository for the ArgoCD application. See "Preparing the GitOps ZTP site configuration repository" for more information.
NoteWhen you create the source repository, ensure that you patch the ArgoCD application with the
argocd/deployment/argocd-openshift-gitops-patch.json
patch-file that you extract from theztp-site-generate
container. See "Configuring the hub cluster with ArgoCD".To be ready for provisioning managed clusters, you require the following for each bare-metal host:
- Network connectivity
- Your network requires DNS. Managed cluster hosts should be reachable from the hub cluster. Ensure that Layer 3 connectivity exists between the hub cluster and the managed cluster host.
- Baseboard Management Controller (BMC) details
-
ZTP uses BMC username and password details to connect to the BMC during cluster installation. The GitOps ZTP plugin manages the
ManagedCluster
CRs on the hub cluster based on theSiteConfig
CR in your site Git repo. You create individualBMCSecret
CRs for each host manually.
Procedure
Create the required managed cluster secrets on the hub cluster. These resources must be in a namespace with a name matching the cluster name. For example, in
out/argocd/example/siteconfig/example-sno.yaml
, the cluster name and namespace isexample-sno
.Export the cluster namespace by running the following command:
export CLUSTERNS=example-sno
$ export CLUSTERNS=example-sno
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the namespace:
oc create namespace $CLUSTERNS
$ oc create namespace $CLUSTERNS
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create pull secret and BMC
Secret
CRs for the managed cluster. The pull secret must contain all the credentials necessary for installing OpenShift Container Platform and all required Operators. See "Creating the managed bare-metal host secrets" for more information.NoteThe secrets are referenced from the
SiteConfig
custom resource (CR) by name. The namespace must match theSiteConfig
namespace.Create a
SiteConfig
CR for your cluster in your local clone of the Git repository:Choose the appropriate example for your CR from the
out/argocd/example/siteconfig/
folder. The folder includes example files for single node, three-node, and standard clusters:-
example-sno.yaml
-
example-3node.yaml
-
example-standard.yaml
-
Change the cluster and host details in the example file to match the type of cluster you want. For example:
# example-node1-bmh-secret & assisted-deployment-pull-secret need to be created under same namespace example-sno --- apiVersion: ran.openshift.io/v1 kind: SiteConfig metadata: name: "example-sno" namespace: "example-sno" spec: baseDomain: "example.com" pullSecretRef: name: "assisted-deployment-pull-secret" clusterImageSetNameRef: "openshift-4.10" sshPublicKey: "ssh-rsa AAAA…" clusters: - clusterName: "example-sno" networkType: "OVNKubernetes" # installConfigOverrides is a generic way of passing install-config # parameters through the siteConfig. The 'capabilities' field configures # the composable openshift feature. In this 'capabilities' setting, we # remove all but the marketplace component from the optional set of # components. # Notes: # - OperatorLifecycleManager is needed for 4.15 and later # - NodeTuning is needed for 4.13 and later, not for 4.12 and earlier installConfigOverrides: | { "capabilities": { "baselineCapabilitySet": "None", "additionalEnabledCapabilities": [ "NodeTuning", "OperatorLifecycleManager" ] } } # It is strongly recommended to include crun manifests as part of the additional install-time manifests for 4.13+. # The crun manifests can be obtained from source-crs/optional-extra-manifest/ and added to the git repo ie.sno-extra-manifest. # extraManifestPath: sno-extra-manifest clusterLabels: # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples du-profile: "latest" # These example cluster labels correspond to the bindingRules in the PolicyGenTemplate examples in ../policygentemplates: # ../policygentemplates/common-ranGen.yaml will apply to all clusters with 'common: true' common: true # ../policygentemplates/group-du-sno-ranGen.yaml will apply to all clusters with 'group-du-sno: ""' group-du-sno: "" # ../policygentemplates/example-sno-site.yaml will apply to all clusters with 'sites: "example-sno"' # Normally this should match or contain the cluster name so it only applies to a single cluster sites : "example-sno" clusterNetwork: - cidr: 1001:1::/48 hostPrefix: 64 machineNetwork: - cidr: 1111:2222:3333:4444::/64 serviceNetwork: - 1001:2::/112 additionalNTPSources: - 1111:2222:3333:4444::2 # Initiates the cluster for workload partitioning. Setting specific reserved/isolated CPUSets is done via PolicyTemplate # please see Workload Partitioning Feature for a complete guide. cpuPartitioningMode: AllNodes # Optionally; This can be used to override the KlusterletAddonConfig that is created for this cluster: crTemplates: # KlusterletAddonConfig: "KlusterletAddonConfigOverride.yaml" nodes: - hostName: "example-node1.example.com" role: "master" # Optionally; This can be used to configure desired BIOS setting on a host: #biosConfigRef: # filePath: "example-hw.profile" bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1" bmcCredentialsName: name: "example-node1-bmh-secret" bootMACAddress: "AA:BB:CC:DD:EE:11" # Use UEFISecureBoot to enable secure boot bootMode: "UEFI" rootDeviceHints: deviceName: "/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0" # disk partition at
/var/lib/containers
with ignitionConfigOverride. Some values must be updated. See DiskPartitionContainer.md for more details ignitionConfigOverride: | { "ignition": { "version": "3.2.0" }, "storage": { "disks": [ { "device": "/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0", "partitions": [ { "label": "var-lib-containers", "sizeMiB": 0, "startMiB": 250000 } ], "wipeTable": false } ], "filesystems": [ { "device": "/dev/disk/by-partlabel/var-lib-containers", "format": "xfs", "mountOptions": [ "defaults", "prjquota" ], "path": "/var/lib/containers", "wipeFilesystem": true } ] }, "systemd": { "units": [ { "contents": " Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target", "enabled": true, "name": "var-lib-containers.mount" } ] } } nodeNetwork: interfaces: - name: eno1 macAddress: "AA:BB:CC:DD:EE:11" config: interfaces: - name: eno1 type: ethernet state: up ipv4: enabled: false ipv6: enabled: true address: # For SNO sites with static IP addresses, the node-specific, # API and Ingress IPs should all be the same and configured on # the interface - ip: 1111:2222:3333:4444::aaaa:1 prefix-length: 64 dns-resolver: config: search: - example.com server: - 1111:2222:3333:4444::2 routes: config: - destination: ::/0 next-hop-interface: eno1 next-hop-address: 1111:2222:3333:4444::1 table-id: 254+
NoteFor more information about BMC addressing, see the "Additional resources" section. The
installConfigOverrides
andignitionConfigOverride
fields are expanded in the example for ease of readability.-
You can inspect the default set of extra-manifest
MachineConfig
CRs inout/argocd/extra-manifest
. It is automatically applied to the cluster when it is installed. -
Optional: To provision additional install-time manifests on the provisioned cluster, create a directory in your Git repository, for example,
sno-extra-manifest/
, and add your custom manifest CRs to this directory. If yourSiteConfig.yaml
refers to this directory in theextraManifestPath
field, any CRs in this referenced directory are appended to the default set of extra manifests.
-
Add the
SiteConfig
CR to thekustomization.yaml
file in thegenerators
section, similar to the example shown inout/argocd/example/siteconfig/kustomization.yaml
. Commit the
SiteConfig
CR and associatedkustomization.yaml
changes in your Git repository and push the changes.The ArgoCD pipeline detects the changes and begins the managed cluster deployment.
19.3.5.1. Single-node OpenShift SiteConfig CR installation reference Copy linkLink copied to clipboard!
SiteConfig CR field | Description |
---|---|
|
Set |
|
Configure the image set available on the hub cluster for all the clusters in the site. To see the list of supported versions on your hub cluster, run |
|
Set the Important
Use the reference configuration as specified in the example |
|
Configure cluster labels to correspond to the |
|
Optional. Set |
|
For single-node deployments, define a single host. For three-node deployments, define three hosts. For standard deployments, define three hosts with |
| BMC address that you use to access the host. Applies to all cluster types. {ztp} supports iPXE and virtual media booting by using Redfish or IPMI protocols. To use iPXE booting, you must use RHACM 2.8 or later. For more information about BMC addressing, see the "Additional resources" section. |
| BMC address that you use to access the host. Applies to all cluster types. {ztp} supports iPXE and virtual media booting by using Redfish or IPMI protocols. To use iPXE booting, you must use RHACM 2.8 or later. For more information about BMC addressing, see the "Additional resources" section. Note In far edge Telco use cases, only virtual media is supported for use with {ztp}. |
|
Configure the |
|
Set the boot mode for the host to |
|
Specifies the device for deployment. Identifiers that are stable across reboots are recommended. For example, |
|
Configure |
| Configure the network settings for the node. |
| Configure the IPv6 address for the host. For single-node OpenShift clusters with static IP addresses, the node-specific API and Ingress IPs should be the same. |
19.3.6. Monitoring managed cluster installation progress Copy linkLink copied to clipboard!
The ArgoCD pipeline uses the SiteConfig
CR to generate the cluster configuration CRs and syncs it with the hub cluster. You can monitor the progress of the synchronization in the ArgoCD dashboard.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
When the synchronization is complete, the installation generally proceeds as follows:
The Assisted Service Operator installs OpenShift Container Platform on the cluster. You can monitor the progress of cluster installation from the RHACM dashboard or from the command line by running the following commands:
Export the cluster name:
export CLUSTER=<clusterName>
$ export CLUSTER=<clusterName>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Query the
AgentClusterInstall
CR for the managed cluster:oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Completed")]}' | jq
$ oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Completed")]}' | jq
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the installation events for the cluster:
curl -sk $(oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.debugInfo.eventsURL}') | jq '.[-2,-1]'
$ curl -sk $(oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.debugInfo.eventsURL}') | jq '.[-2,-1]'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.3.7. Troubleshooting GitOps ZTP by validating the installation CRs Copy linkLink copied to clipboard!
The ArgoCD pipeline uses the SiteConfig
and PolicyGenTemplate
custom resources (CRs) to generate the cluster configuration CRs and Red Hat Advanced Cluster Management (RHACM) policies. Use the following steps to troubleshoot issues that might occur during this process.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
Check that the installation CRs were created by using the following command:
oc get AgentClusterInstall -n <cluster_name>
$ oc get AgentClusterInstall -n <cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If no object is returned, use the following steps to troubleshoot the ArgoCD pipeline flow from
SiteConfig
files to the installation CRs.Verify that the
ManagedCluster
CR was generated using theSiteConfig
CR on the hub cluster:oc get managedcluster
$ oc get managedcluster
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the
ManagedCluster
is missing, check if theclusters
application failed to synchronize the files from the Git repository to the hub cluster:oc describe -n openshift-gitops application clusters
$ oc describe -n openshift-gitops application clusters
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check for the
Status.Conditions
field to view the error logs for the managed cluster. For example, setting an invalid value forextraManifestPath:
in theSiteConfig
CR raises the following error:Status: Conditions: Last Transition Time: 2021-11-26T17:21:39Z Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/siteconfigs/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not create extra-manifest ranSite1.extra-manifest3 stat extra-manifest3: no such file or directory 2021/11/26 17:21:40 Error: could not build the entire SiteConfig defined by /tmp/kust-plugin-config-913473579: stat extra-manifest3: no such file or directory Error: failure in plugin configured via /tmp/kust-plugin-config-913473579; exit status 1: exit status 1 Type: ComparisonError
Status: Conditions: Last Transition Time: 2021-11-26T17:21:39Z Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/siteconfigs/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not create extra-manifest ranSite1.extra-manifest3 stat extra-manifest3: no such file or directory 2021/11/26 17:21:40 Error: could not build the entire SiteConfig defined by /tmp/kust-plugin-config-913473579: stat extra-manifest3: no such file or directory Error: failure in plugin configured via /tmp/kust-plugin-config-913473579; exit status 1: exit status 1 Type: ComparisonError
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
Status.Sync
field. If there are log errors, theStatus.Sync
field could indicate anUnknown
error:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.3.8. Troubleshooting {ztp} virtual media booting on Supermicro servers Copy linkLink copied to clipboard!
SuperMicro X11 servers do not support virtual media installations when the image is served using the https
protocol. As a result, single-node OpenShift deployments for this environment fail to boot on the target node. To avoid this issue, log in to the hub cluster and disable Transport Layer Security (TLS) in the Provisioning
resource. This ensures the image is not served with TLS even though the image address uses the https
scheme.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
Disable TLS in the
Provisioning
resource by running the following command:oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"disableVirtualMediaTLS": true}}'
$ oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"disableVirtualMediaTLS": true}}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Continue the steps to deploy your single-node OpenShift cluster.
19.3.9. Removing a managed cluster site from the ZTP pipeline Copy linkLink copied to clipboard!
You can remove a managed site and the associated installation and configuration policy CRs from the ZTP pipeline.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
Remove a site and the associated CRs by removing the associated
SiteConfig
andPolicyGenTemplate
files from thekustomization.yaml
file.When you run the ZTP pipeline again, the generated CRs are removed.
-
Optional: If you want to permanently remove a site, you should also remove the
SiteConfig
and site-specificPolicyGenTemplate
files from the Git repository. -
Optional: If you want to remove a site temporarily, for example when redeploying a site, you can leave the
SiteConfig
and site-specificPolicyGenTemplate
CRs in the Git repository.
19.3.10. Removing obsolete content from the ZTP pipeline Copy linkLink copied to clipboard!
If a change to the PolicyGenTemplate
configuration results in obsolete policies, for example, if you rename policies, use the following procedure to remove the obsolete policies.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
-
Remove the affected
PolicyGenTemplate
files from the Git repository, commit and push to the remote repository. - Wait for the changes to synchronize through the application and the affected policies to be removed from the hub cluster.
Add the updated
PolicyGenTemplate
files back to the Git repository, and then commit and push to the remote repository.NoteRemoving zero touch provisioning (ZTP) policies from the Git repository, and as a result also removing them from the hub cluster, does not affect the configuration of the managed cluster. The policy and CRs managed by that policy remains in place on the managed cluster.
Optional: As an alternative, after making changes to
PolicyGenTemplate
CRs that result in obsolete policies, you can remove these policies from the hub cluster manually. You can delete policies from the RHACM console using the Governance tab or by running the following command:oc delete policy -n <namespace> <policy_name>
$ oc delete policy -n <namespace> <policy_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.3.11. Tearing down the ZTP pipeline Copy linkLink copied to clipboard!
You can remove the ArgoCD pipeline and all generated ZTP artifacts.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
- Detach all clusters from Red Hat Advanced Cluster Management (RHACM) on the hub cluster.
Delete the
kustomization.yaml
file in thedeployment
directory using the following command:oc delete -k out/argocd/deployment
$ oc delete -k out/argocd/deployment
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Commit and push your changes to the site repository.
19.4. Configuring managed clusters with policies and PolicyGenTemplate resources Copy linkLink copied to clipboard!
Applied policy custom resources (CRs) configure the managed clusters that you provision. You can customize how Red Hat Advanced Cluster Management (RHACM) uses PolicyGenTemplate
CRs to generate the applied policy CRs.
19.4.1. About the PolicyGenTemplate CRD Copy linkLink copied to clipboard!
The PolicyGenTemplate
custom resource definition (CRD) tells the PolicyGen
policy generator what custom resources (CRs) to include in the cluster configuration, how to combine the CRs into the generated policies, and what items in those CRs need to be updated with overlay content.
The following example shows a PolicyGenTemplate
CR (common-du-ranGen.yaml
) extracted from the ztp-site-generate
reference container. The common-du-ranGen.yaml
file defines two Red Hat Advanced Cluster Management (RHACM) policies. The polices manage a collection of configuration CRs, one for each unique value of policyName
in the CR. common-du-ranGen.yaml
creates a single placement binding and a placement rule to bind the policies to clusters based on the labels listed in the bindingRules
section.
Example PolicyGenTemplate CR - common-du-ranGen.yaml
- 1
common: "true"
applies the policies to all clusters with this label.- 2
- Files listed under
sourceFiles
create the Operator policies for installed clusters. - 3
OperatorHub.yaml
configures the OperatorHub for the disconnected registry.- 4
DefaultCatsrc.yaml
configures the catalog source for the disconnected registry.- 5
policyName: "config-policy"
configures Operator subscriptions. TheOperatorHub
CR disables the default and this CR replacesredhat-operators
with aCatalogSource
CR that points to the disconnected registry.
A PolicyGenTemplate
CR can be constructed with any number of included CRs. Apply the following example CR in the hub cluster to generate a policy containing a single CR:
Using the source file PtpConfigSlave.yaml
as an example, the file defines a PtpConfig
CR. The generated policy for the PtpConfigSlave
example is named group-du-sno-config-policy
. The PtpConfig
CR defined in the generated group-du-sno-config-policy
is named du-ptp-slave
. The spec
defined in PtpConfigSlave.yaml
is placed under du-ptp-slave
along with the other spec
items defined under the source file.
The following example shows the group-du-sno-config-policy
CR:
19.4.2. Recommendations when customizing PolicyGenTemplate CRs Copy linkLink copied to clipboard!
Consider the following best practices when customizing site configuration PolicyGenTemplate
custom resources (CRs):
-
Use as few policies as are necessary. Using fewer policies requires less resources. Each additional policy creates overhead for the hub cluster and the deployed managed cluster. CRs are combined into policies based on the
policyName
field in thePolicyGenTemplate
CR. CRs in the samePolicyGenTemplate
which have the same value forpolicyName
are managed under a single policy. -
In disconnected environments, use a single catalog source for all Operators by configuring the registry as a single index containing all Operators. Each additional
CatalogSource
CR on the managed clusters increases CPU usage. -
MachineConfig
CRs should be included asextraManifests
in theSiteConfig
CR so that they are applied during installation. This can reduce the overall time taken until the cluster is ready to deploy applications. -
PolicyGenTemplates
should override the channel field to explicitly identify the desired version. This ensures that changes in the source CR during upgrades does not update the generated subscription.
When managing large numbers of spoke clusters on the hub cluster, minimize the number of policies to reduce resource consumption.
Grouping multiple configuration CRs into a single or limited number of policies is one way to reduce the overall number of policies on the hub cluster. When using the common, group, and site hierarchy of policies for managing site configuration, it is especially important to combine site-specific configuration into a single policy.
19.4.3. PolicyGenTemplate CRs for RAN deployments Copy linkLink copied to clipboard!
Use PolicyGenTemplate
(PGT) custom resources (CRs) to customize the configuration applied to the cluster by using the GitOps zero touch provisioning (ZTP) pipeline. The PGT CR allows you to generate one or more policies to manage the set of configuration CRs on your fleet of clusters. The PGT identifies the set of managed CRs, bundles them into policies, builds the policy wrapping around those CRs, and associates the policies with clusters by using label binding rules.
The reference configuration, obtained from the GitOps ZTP container, is designed to provide a set of critical features and node tuning settings that ensure the cluster can support the stringent performance and resource utilization constraints typical of RAN (Radio Access Network) Distributed Unit (DU) applications. Changes or omissions from the baseline configuration can affect feature availability, performance, and resource utilization. Use the reference PolicyGenTemplate
CRs as the basis to create a hierarchy of configuration files tailored to your specific site requirements.
The baseline PolicyGenTemplate
CRs that are defined for RAN DU cluster configuration can be extracted from the GitOps ZTP ztp-site-generate
container. See "Preparing the GitOps ZTP site configuration repository" for further details.
The PolicyGenTemplate
CRs can be found in the ./out/argocd/example/policygentemplates
folder. The reference architecture has common, group, and site-specific configuration CRs. Each PolicyGenTemplate
CR refers to other CRs that can be found in the ./out/source-crs
folder.
The PolicyGenTemplate
CRs relevant to RAN cluster configuration are described below. Variants are provided for the group PolicyGenTemplate
CRs to account for differences in single-node, three-node compact, and standard cluster configurations. Similarly, site-specific configuration variants are provided for single-node clusters and multi-node (compact or standard) clusters. Use the group and site-specific configuration variants that are relevant for your deployment.
PolicyGenTemplate CR | Description |
---|---|
| Contains a set of CRs that get applied to multi-node clusters. These CRs configure SR-IOV features typical for RAN installations. |
| Contains a set of CRs that get applied to single-node OpenShift clusters. These CRs configure SR-IOV features typical for RAN installations. |
| Contains a set of common RAN CRs that get applied to all clusters. These CRs subscribe to a set of operators providing cluster features typical for RAN as well as baseline cluster tuning. |
| Contains the RAN policies for three-node clusters only. |
| Contains the RAN policies for single-node clusters only. |
| Contains the RAN policies for standard three control-plane clusters. |
|
|
|
|
|
|
19.4.4. Customizing a managed cluster with PolicyGenTemplate CRs Copy linkLink copied to clipboard!
Use the following procedure to customize the policies that get applied to the managed cluster that you provision using the zero touch provisioning (ZTP) pipeline.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You configured the hub cluster for generating the required installation and policy CRs.
- You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
Procedure
Create a
PolicyGenTemplate
CR for site-specific configuration CRs.-
Choose the appropriate example for your CR from the
out/argocd/example/policygentemplates
folder, for example,example-sno-site.yaml
orexample-multinode-site.yaml
. Change the
bindingRules
field in the example file to match the site-specific label included in theSiteConfig
CR. In the exampleSiteConfig
file, the site-specific label issites: example-sno
.NoteEnsure that the labels defined in your
PolicyGenTemplate
bindingRules
field correspond to the labels that are defined in the related managed clustersSiteConfig
CR.- Change the content in the example file to match the desired configuration.
-
Choose the appropriate example for your CR from the
Optional: Create a
PolicyGenTemplate
CR for any common configuration CRs that apply to the entire fleet of clusters.-
Select the appropriate example for your CR from the
out/argocd/example/policygentemplates
folder, for example,common-ranGen.yaml
. - Change the content in the example file to match the desired configuration.
-
Select the appropriate example for your CR from the
Optional: Create a
PolicyGenTemplate
CR for any group configuration CRs that apply to the certain groups of clusters in the fleet.Ensure that the content of the overlaid spec files matches your desired end state. As a reference, the out/source-crs directory contains the full list of source-crs available to be included and overlaid by your PolicyGenTemplate templates.
NoteDepending on the specific requirements of your clusters, you might need more than a single group policy per cluster type, especially considering that the example group policies each have a single PerformancePolicy.yaml file that can only be shared across a set of clusters if those clusters consist of identical hardware configurations.
-
Select the appropriate example for your CR from the
out/argocd/example/policygentemplates
folder, for example,group-du-sno-ranGen.yaml
. - Change the content in the example file to match the desired configuration.
-
Select the appropriate example for your CR from the
-
Optional. Create a validator inform policy
PolicyGenTemplate
CR to signal when the ZTP installation and configuration of the deployed cluster is complete. For more information, see "Creating a validator inform policy". Define all the policy namespaces in a YAML file similar to the example
out/argocd/example/policygentemplates/ns.yaml
file.ImportantDo not include the
Namespace
CR in the same file with thePolicyGenTemplate
CR.-
Add the
PolicyGenTemplate
CRs andNamespace
CR to thekustomization.yaml
file in the generators section, similar to the example shown inout/argocd/example/policygentemplates/kustomization.yaml
. Commit the
PolicyGenTemplate
CRs,Namespace
CR, and associatedkustomization.yaml
file in your Git repository and push the changes.The ArgoCD pipeline detects the changes and begins the managed cluster deployment. You can push the changes to the
SiteConfig
CR and thePolicyGenTemplate
CR simultaneously.
19.4.5. Monitoring managed cluster policy deployment progress Copy linkLink copied to clipboard!
The ArgoCD pipeline uses PolicyGenTemplate
CRs in Git to generate the RHACM policies and then sync them to the hub cluster. You can monitor the progress of the managed cluster policy synchronization after the assisted service installs OpenShift Container Platform on the managed cluster.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
The Topology Aware Lifecycle Manager (TALM) applies the configuration policies that are bound to the cluster.
After the cluster installation is complete and the cluster becomes
Ready
, aClusterGroupUpgrade
CR corresponding to this cluster, with a list of ordered policies defined by theran.openshift.io/ztp-deploy-wave annotations
, is automatically created by the TALM. The cluster’s policies are applied in the order listed inClusterGroupUpgrade
CR.You can monitor the high-level progress of configuration policy reconciliation by using the following commands:
export CLUSTER=<clusterName>
$ export CLUSTER=<clusterName>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[-1:]}' | jq
$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[-1:]}' | jq
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can monitor the detailed cluster policy compliance status by using the RHACM dashboard or the command line.
To check policy compliance by using
oc
, run the following command:oc get policies -n $CLUSTER
$ oc get policies -n $CLUSTER
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To check policy status from the RHACM web console, perform the following actions:
-
Click Governance
Find policies. - Click on a cluster policy to check it’s status.
-
Click Governance
When all of the cluster policies become compliant, ZTP installation and configuration for the cluster is complete. The ztp-done
label is added to the cluster.
In the reference configuration, the final policy that becomes compliant is the one defined in the *-du-validator-policy
policy. This policy, when compliant on a cluster, ensures that all cluster configuration, Operator installation, and Operator configuration is complete.
19.4.6. Validating the generation of configuration policy CRs Copy linkLink copied to clipboard!
Policy custom resources (CRs) are generated in the same namespace as the PolicyGenTemplate
from which they are created. The same troubleshooting flow applies to all policy CRs generated from a PolicyGenTemplate
regardless of whether they are ztp-common
, ztp-group
, or ztp-site
based, as shown using the following commands:
export NS=<namespace>
$ export NS=<namespace>
oc get policy -n $NS
$ oc get policy -n $NS
The expected set of policy-wrapped CRs should be displayed.
If the policies failed synchronization, use the following troubleshooting steps.
Procedure
To display detailed information about the policies, run the following command:
oc describe -n openshift-gitops application policies
$ oc describe -n openshift-gitops application policies
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check for
Status: Conditions:
to show the error logs. For example, setting an invalidsourceFile→fileName:
generates the error shown below:Status: Conditions: Last Transition Time: 2021-11-26T17:21:39Z Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/policies/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not find test.yaml under source-crs/: no such file or directory Error: failure in plugin configured via /tmp/kust-plugin-config-52463179; exit status 1: exit status 1 Type: ComparisonError
Status: Conditions: Last Transition Time: 2021-11-26T17:21:39Z Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/policies/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not find test.yaml under source-crs/: no such file or directory Error: failure in plugin configured via /tmp/kust-plugin-config-52463179; exit status 1: exit status 1 Type: ComparisonError
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check for
Status: Sync:
. If there are log errors atStatus: Conditions:
, theStatus: Sync:
showsUnknown
orError
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow When Red Hat Advanced Cluster Management (RHACM) recognizes that policies apply to a
ManagedCluster
object, the policy CR objects are applied to the cluster namespace. Check to see if the policies were copied to the cluster namespace:oc get policy -n $CLUSTER
$ oc get policy -n $CLUSTER
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow RHACM copies all applicable policies into the cluster namespace. The copied policy names have the format:
<policyGenTemplate.Namespace>.<policyGenTemplate.Name>-<policyName>
.Check the placement rule for any policies not copied to the cluster namespace. The
matchSelector
in thePlacementRule
for those policies should match labels on theManagedCluster
object:oc get placementrule -n $NS
$ oc get placementrule -n $NS
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Note the
PlacementRule
name appropriate for the missing policy, common, group, or site, using the following command:oc get placementrule -n $NS <placementRuleName> -o yaml
$ oc get placementrule -n $NS <placementRuleName> -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - The status-decisions should include your cluster name.
-
The key-value pair of the
matchSelector
in the spec must match the labels on your managed cluster.
Check the labels on the
ManagedCluster
object using the following command:oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq
$ oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check to see which policies are compliant using the following command:
oc get policy -n $CLUSTER
$ oc get policy -n $CLUSTER
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the
Namespace
,OperatorGroup
, andSubscription
policies are compliant but the Operator configuration policies are not, it is likely that the Operators did not install on the managed cluster. This causes the Operator configuration policies to fail to apply because the CRD is not yet applied to the spoke.
19.4.7. Restarting policy reconciliation Copy linkLink copied to clipboard!
You can restart policy reconciliation when unexpected compliance issues occur, for example, when the ClusterGroupUpgrade
custom resource (CR) has timed out.
Procedure
A
ClusterGroupUpgrade
CR is generated in the namespaceztp-install
by the Topology Aware Lifecycle Manager after the managed cluster becomesReady
:export CLUSTER=<clusterName>
$ export CLUSTER=<clusterName>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get clustergroupupgrades -n ztp-install $CLUSTER
$ oc get clustergroupupgrades -n ztp-install $CLUSTER
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If there are unexpected issues and the policies fail to become complaint within the configured timeout (the default is 4 hours), the status of the
ClusterGroupUpgrade
CR showsUpgradeTimedOut
:oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow A
ClusterGroupUpgrade
CR in theUpgradeTimedOut
state automatically restarts its policy reconciliation every hour. If you have changed your policies, you can start a retry immediately by deleting the existingClusterGroupUpgrade
CR. This triggers the automatic creation of a newClusterGroupUpgrade
CR that begins reconciling the policies immediately:oc delete clustergroupupgrades -n ztp-install $CLUSTER
$ oc delete clustergroupupgrades -n ztp-install $CLUSTER
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Note that when the ClusterGroupUpgrade
CR completes with status UpgradeCompleted
and the managed cluster has the label ztp-done
applied, you can make additional configuration changes using PolicyGenTemplate
. Deleting the existing ClusterGroupUpgrade
CR will not make the TALM generate a new CR.
At this point, ZTP has completed its interaction with the cluster and any further interactions should be treated as an update and a new ClusterGroupUpgrade
CR created for remediation of the policies.
19.4.8. Changing applied managed cluster CRs using policies Copy linkLink copied to clipboard!
You can remove content from a custom resource (CR) that is deployed in a managed cluster through a policy.
By default, all Policy
CRs created from a PolicyGenTemplate
CR have the complianceType
field set to musthave
. A musthave
policy without the removed content is still compliant because the CR on the managed cluster has all the specified content. With this configuration, when you remove content from a CR, TALM removes the content from the policy but the content is not removed from the CR on the managed cluster.
With the complianceType
field to mustonlyhave
, the policy ensures that the CR on the cluster is an exact match of what is specified in the policy.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have deployed a managed cluster from a hub cluster running RHACM.
- You have installed Topology Aware Lifecycle Manager on the hub cluster.
Procedure
Remove the content that you no longer need from the affected CRs. In this example, the
disableDrain: false
line was removed from theSriovOperatorConfig
CR.Example CR
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Change the
complianceType
of the affected policies tomustonlyhave
in thegroup-du-sno-ranGen.yaml
file.Example YAML
# ... - fileName: SriovOperatorConfig.yaml policyName: "config-policy" complianceType: mustonlyhave # ...
# ... - fileName: SriovOperatorConfig.yaml policyName: "config-policy" complianceType: mustonlyhave # ...
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
ClusterGroupUpdates
CR and specify the clusters that must receive the CR changes::Example ClusterGroupUpdates CR
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
ClusterGroupUpgrade
CR by running the following command:oc create -f cgu-remove.yaml
$ oc create -f cgu-remove.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When you are ready to apply the changes, for example, during an appropriate maintenance window, change the value of the
spec.enable
field totrue
by running the following command:oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-remove \ --patch '{"spec":{"enable":true}}' --type=merge
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-remove \ --patch '{"spec":{"enable":true}}' --type=merge
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check the status of the policies by running the following command:
oc get <kind> <changed_cr_name>
$ oc get <kind> <changed_cr_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default cgu-ztp-group.group-du-sno-config-policy enforce 17m default ztp-group.group-du-sno-config-policy inform NonCompliant 15h
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default cgu-ztp-group.group-du-sno-config-policy enforce 17m default ztp-group.group-du-sno-config-policy inform NonCompliant 15h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When the
COMPLIANCE STATE
of the policy isCompliant
, it means that the CR is updated and the unwanted content is removed.Check that the policies are removed from the targeted clusters by running the following command on the managed clusters:
oc get <kind> <changed_cr_name>
$ oc get <kind> <changed_cr_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If there are no results, the CR is removed from the managed cluster.
19.4.9. Indication of done for ZTP installations Copy linkLink copied to clipboard!
Zero touch provisioning (ZTP) simplifies the process of checking the ZTP installation status for a cluster. The ZTP status moves through three phases: cluster installation, cluster configuration, and ZTP done.
- Cluster installation phase
-
The cluster installation phase is shown by the
ManagedClusterJoined
andManagedClusterAvailable
conditions in theManagedCluster
CR . If theManagedCluster
CR does not have these conditions, or the condition is set toFalse
, the cluster is still in the installation phase. Additional details about installation are available from theAgentClusterInstall
andClusterDeployment
CRs. For more information, see "Troubleshooting GitOps ZTP". - Cluster configuration phase
-
The cluster configuration phase is shown by a
ztp-running
label applied theManagedCluster
CR for the cluster. - ZTP done
Cluster installation and configuration is complete in the ZTP done phase. This is shown by the removal of the
ztp-running
label and addition of theztp-done
label to theManagedCluster
CR. Theztp-done
label shows that the configuration has been applied and the baseline DU configuration has completed cluster tuning.The transition to the ZTP done state is conditional on the compliant state of a Red Hat Advanced Cluster Management (RHACM) validator inform policy. This policy captures the existing criteria for a completed installation and validates that it moves to a compliant state only when ZTP provisioning of the managed cluster is complete.
The validator inform policy ensures the configuration of the cluster is fully applied and Operators have completed their initialization. The policy validates the following:
-
The target
MachineConfigPool
contains the expected entries and has finished updating. All nodes are available and not degraded. -
The SR-IOV Operator has completed initialization as indicated by at least one
SriovNetworkNodeState
withsyncStatus: Succeeded
. - The PTP Operator daemon set exists.
-
The target
19.5. Manually installing a single-node OpenShift cluster with ZTP Copy linkLink copied to clipboard!
You can deploy a managed single-node OpenShift cluster by using Red Hat Advanced Cluster Management (RHACM) and the assisted service.
If you are creating multiple managed clusters, use the SiteConfig
method described in Deploying far edge sites with ZTP.
The target bare-metal host must meet the networking, firmware, and hardware requirements listed in Recommended cluster configuration for vDU application workloads.
19.5.1. Generating ZTP installation and configuration CRs manually Copy linkLink copied to clipboard!
Use the generator
entrypoint for the ztp-site-generate
container to generate the site installation and configuration custom resource (CRs) for a cluster based on SiteConfig
and PolicyGenTemplate
CRs.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
Create an output folder by running the following command:
mkdir -p ./out
$ mkdir -p ./out
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Export the
argocd
directory from theztp-site-generate
container image:podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12 extract /home/ztp --tar | tar x -C ./out
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12 extract /home/ztp --tar | tar x -C ./out
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
./out
directory has the referencePolicyGenTemplate
andSiteConfig
CRs in theout/argocd/example/
folder.Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create an output folder for the site installation CRs:
mkdir -p ./site-install
$ mkdir -p ./site-install
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Modify the example
SiteConfig
CR for the cluster type that you want to install. Copyexample-sno.yaml
tosite-1-sno.yaml
and modify the CR to match the details of the site and bare-metal host that you want to install, for example:Example single-node OpenShift cluster SiteConfig CR
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Create the
assisted-deployment-pull-secret
CR with the same namespace as theSiteConfig
CR. - 2
clusterImageSetNameRef
defines an image set available on the hub cluster. To see the list of supported versions on your hub cluster, runoc get clusterimagesets
.- 3
- Configure the SSH public key used to access the cluster.
- 4
- Cluster labels must correspond to the
bindingRules
field in thePolicyGenTemplate
CRs that you define. For example,policygentemplates/common-ranGen.yaml
applies to all clusters withcommon: true
set,policygentemplates/group-du-sno-ranGen.yaml
applies to all clusters withgroup-du-sno: ""
set. - 5
- Optional. The CR specifed under
KlusterletAddonConfig
is used to override the defaultKlusterletAddonConfig
that is created for the cluster. - 6
- For single-node deployments, define a single host. For three-node deployments, define three hosts. For standard deployments, define three hosts with
role: master
and two or more hosts defined withrole: worker
. - 7
- BMC address that you use to access the host. Applies to all cluster types.
- 8
- Name of the
bmh-secret
CR that you separately create with the host BMC credentials. When creating thebmh-secret
CR, use the same namespace as theSiteConfig
CR that provisions the host. - 9
- Configures the boot mode for the host. The default value is
UEFI
. UseUEFISecureBoot
to enable secure boot on the host. - 10
cpuset
must match the value set in the clusterPerformanceProfile
CRspec.cpu.reserved
field for workload partitioning.- 11
- Specifies the network settings for the node.
- 12
- Configures the IPv6 address for the host. For single-node OpenShift clusters with static IP addresses, the node-specific API and Ingress IPs should be the same.
Generate the day-0 installation CRs by processing the modified
SiteConfig
CRsite-1-sno.yaml
by running the following command:podman run -it --rm -v `pwd`/out/argocd/example/siteconfig:/resources:Z -v `pwd`/site-install:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12.1 generator install site-1-sno.yaml /output
$ podman run -it --rm -v `pwd`/out/argocd/example/siteconfig:/resources:Z -v `pwd`/site-install:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12.1 generator install site-1-sno.yaml /output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Generate just the day-0
MachineConfig
installation CRs for a particular cluster type by processing the referenceSiteConfig
CR with the-E
option. For example, run the following commands:Create an output folder for the
MachineConfig
CRs:mkdir -p ./site-machineconfig
$ mkdir -p ./site-machineconfig
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Generate the
MachineConfig
installation CRs:podman run -it --rm -v `pwd`/out/argocd/example/siteconfig:/resources:Z -v `pwd`/site-machineconfig:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12.1 generator install -E site-1-sno.yaml /output
$ podman run -it --rm -v `pwd`/out/argocd/example/siteconfig:/resources:Z -v `pwd`/site-machineconfig:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12.1 generator install -E site-1-sno.yaml /output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
site-machineconfig └── site-1-sno ├── site-1-sno_machineconfig_02-master-workload-partitioning.yaml ├── site-1-sno_machineconfig_predefined-extra-manifests-master.yaml └── site-1-sno_machineconfig_predefined-extra-manifests-worker.yaml
site-machineconfig └── site-1-sno ├── site-1-sno_machineconfig_02-master-workload-partitioning.yaml ├── site-1-sno_machineconfig_predefined-extra-manifests-master.yaml └── site-1-sno_machineconfig_predefined-extra-manifests-worker.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Generate and export the day-2 configuration CRs using the reference
PolicyGenTemplate
CRs from the previous step. Run the following commands:Create an output folder for the day-2 CRs:
mkdir -p ./ref
$ mkdir -p ./ref
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Generate and export the day-2 configuration CRs:
podman run -it --rm -v `pwd`/out/argocd/example/policygentemplates:/resources:Z -v `pwd`/ref:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12.1 generator config -N . /output
$ podman run -it --rm -v `pwd`/out/argocd/example/policygentemplates:/resources:Z -v `pwd`/ref:/output:Z,U registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12.1 generator config -N . /output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The command generates example group and site-specific
PolicyGenTemplate
CRs for single-node OpenShift, three-node clusters, and standard clusters in the./ref
folder.Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Use the generated CRs as the basis for the CRs that you use to install the cluster. You apply the installation CRs to the hub cluster as described in "Installing a single managed cluster". The configuration CRs can be applied to the cluster after cluster installation is complete.
19.5.2. Creating the managed bare-metal host secrets Copy linkLink copied to clipboard!
Add the required Secret
custom resources (CRs) for the managed bare-metal host to the hub cluster. You need a secret for the ZTP pipeline to access the Baseboard Management Controller (BMC) and a secret for the assisted installer service to pull cluster installation images from the registry.
The secrets are referenced from the SiteConfig
CR by name. The namespace must match the SiteConfig
namespace.
Procedure
Create a YAML secret file containing credentials for the host Baseboard Management Controller (BMC) and a pull secret required for installing OpenShift and all add-on cluster Operators:
Save the following YAML as the file
example-sno-secret.yaml
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
-
Add the relative path to
example-sno-secret.yaml
to thekustomization.yaml
file that you use to install the cluster.
19.5.3. Configuring Discovery ISO kernel arguments for manual installations using GitOps ZTP Copy linkLink copied to clipboard!
The GitOps ZTP workflow uses the Discovery ISO as part of the OpenShift Container Platform installation process on managed bare-metal hosts. You can edit the InfraEnv
resource to specify kernel arguments for the Discovery ISO. This is useful for cluster installations with specific environmental requirements. For example, configure the rd.net.timeout.carrier
kernel argument for the Discovery ISO to facilitate static networking for the cluster or to receive a DHCP address before downloading the root file system during installation.
In OpenShift Container Platform 4.12, you can only add kernel arguments. You can not replace or delete kernel arguments.
Prerequisites
- You have installed the OpenShift CLI (oc).
- You have logged in to the hub cluster as a user with cluster-admin privileges.
- You have manually generated the installation and configuration custom resources (CRs).
Procedure
-
Edit the
spec.kernelArguments
specification in theInfraEnv
CR to configure kernel arguments:
The SiteConfig
CR generates the InfraEnv
resource as part of the day-0 installation CRs.
Verification
To verify that the kernel arguments are applied, after the Discovery image verifies that OpenShift Container Platform is ready for installation, you can SSH to the target host before the installation process begins. At that point, you can view the kernel arguments for the Discovery ISO in the /proc/cmdline
file.
Begin an SSH session with the target host:
ssh -i /path/to/privatekey core@<host_name>
$ ssh -i /path/to/privatekey core@<host_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View the system’s kernel arguments by using the following command:
cat /proc/cmdline
$ cat /proc/cmdline
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.5.4. Installing a single managed cluster Copy linkLink copied to clipboard!
You can manually deploy a single managed cluster using the assisted service and Red Hat Advanced Cluster Management (RHACM).
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. -
You have created the baseboard management controller (BMC)
Secret
and the image pull-secretSecret
custom resources (CRs). See "Creating the managed bare-metal host secrets" for details. - Your target bare-metal host meets the networking and hardware requirements for managed clusters.
Procedure
Create a
ClusterImageSet
for each specific cluster version to be deployed, for exampleclusterImageSet-4.12.yaml
. AClusterImageSet
has the following format:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
clusterImageSet
CR:oc apply -f clusterImageSet-4.12.yaml
$ oc apply -f clusterImageSet-4.12.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
Namespace
CR in thecluster-namespace.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
Namespace
CR by running the following command:oc apply -f cluster-namespace.yaml
$ oc apply -f cluster-namespace.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the generated day-0 CRs that you extracted from the
ztp-site-generate
container and customized to meet your requirements:oc apply -R ./site-install/site-sno-1
$ oc apply -R ./site-install/site-sno-1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.5.5. Monitoring the managed cluster installation status Copy linkLink copied to clipboard!
Ensure that cluster provisioning was successful by checking the cluster status.
Prerequisites
-
All of the custom resources have been configured and provisioned, and the
Agent
custom resource is created on the hub for the managed cluster.
Procedure
Check the status of the managed cluster:
oc get managedcluster
$ oc get managedcluster
Copy to Clipboard Copied! Toggle word wrap Toggle overflow True
indicates the managed cluster is ready.Check the agent status:
oc get agent -n <cluster_name>
$ oc get agent -n <cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
describe
command to provide an in-depth description of the agent’s condition. Statuses to be aware of includeBackendError
,InputError
,ValidationsFailing
,InstallationFailed
, andAgentIsConnected
. These statuses are relevant to theAgent
andAgentClusterInstall
custom resources.oc describe agent -n <cluster_name>
$ oc describe agent -n <cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the cluster provisioning status:
oc get agentclusterinstall -n <cluster_name>
$ oc get agentclusterinstall -n <cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
describe
command to provide an in-depth description of the cluster provisioning status:oc describe agentclusterinstall -n <cluster_name>
$ oc describe agentclusterinstall -n <cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the managed cluster’s add-on services:
oc get managedclusteraddon -n <cluster_name>
$ oc get managedclusteraddon -n <cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Retrieve the authentication information of the
kubeconfig
file for the managed cluster:oc get secret -n <cluster_name> <cluster_name>-admin-kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d > <directory>/<cluster_name>-kubeconfig
$ oc get secret -n <cluster_name> <cluster_name>-admin-kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d > <directory>/<cluster_name>-kubeconfig
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.5.6. Troubleshooting the managed cluster Copy linkLink copied to clipboard!
Use this procedure to diagnose any installation issues that might occur with the managed cluster.
Procedure
Check the status of the managed cluster:
oc get managedcluster
$ oc get managedcluster
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE SNO-cluster true True True 2d19h
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE SNO-cluster true True True 2d19h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the status in the
AVAILABLE
column isTrue
, the managed cluster is being managed by the hub.If the status in the
AVAILABLE
column isUnknown
, the managed cluster is not being managed by the hub. Use the following steps to continue checking to get more information.Check the
AgentClusterInstall
install status:oc get clusterdeployment -n <cluster_name>
$ oc get clusterdeployment -n <cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME PLATFORM REGION CLUSTERTYPE INSTALLED INFRAID VERSION POWERSTATE AGE Sno0026 agent-baremetal false Initialized 2d14h
NAME PLATFORM REGION CLUSTERTYPE INSTALLED INFRAID VERSION POWERSTATE AGE Sno0026 agent-baremetal false Initialized 2d14h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the status in the
INSTALLED
column isfalse
, the installation was unsuccessful.If the installation failed, enter the following command to review the status of the
AgentClusterInstall
resource:oc describe agentclusterinstall -n <cluster_name> <cluster_name>
$ oc describe agentclusterinstall -n <cluster_name> <cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Resolve the errors and reset the cluster:
Remove the cluster’s managed cluster resource:
oc delete managedcluster <cluster_name>
$ oc delete managedcluster <cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the cluster’s namespace:
oc delete namespace <cluster_name>
$ oc delete namespace <cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This deletes all of the namespace-scoped custom resources created for this cluster. You must wait for the
ManagedCluster
CR deletion to complete before proceeding.- Recreate the custom resources for the managed cluster.
19.5.7. RHACM generated cluster installation CRs reference Copy linkLink copied to clipboard!
Red Hat Advanced Cluster Management (RHACM) supports deploying OpenShift Container Platform on single-node clusters, three-node clusters, and standard clusters with a specific set of installation custom resources (CRs) that you generate using SiteConfig
CRs for each site.
Every managed cluster has its own namespace, and all of the installation CRs except for ManagedCluster
and ClusterImageSet
are under that namespace. ManagedCluster
and ClusterImageSet
are cluster-scoped, not namespace-scoped. The namespace and the CR names match the cluster name.
The following table lists the installation CRs that are automatically applied by the RHACM assisted service when it installs clusters using the SiteConfig
CRs that you configure.
CR | Description | Usage |
---|---|---|
| Contains the connection information for the Baseboard Management Controller (BMC) of the target bare-metal host. | Provides access to the BMC to load and start the discovery image on the target server by using the Redfish protocol. |
| Contains information for installing OpenShift Container Platform on the target bare-metal host. |
Used with |
|
Specifies details of the managed cluster configuration such as networking and the number of control plane nodes. Displays the cluster | Specifies the managed cluster configuration information and provides status during the installation of the cluster. |
|
References the |
Used with |
|
Provides network configuration information such as | Sets up a static IP address for the managed cluster’s Kube API server. |
| Contains hardware information about the target bare-metal host. | Created automatically on the hub when the target machine’s discovery image boots. |
| When a cluster is managed by the hub, it must be imported and known. This Kubernetes object provides that interface. | The hub uses this resource to manage and show the status of managed clusters. |
|
Contains the list of services provided by the hub to be deployed to the |
Tells the hub which addon services to deploy to the |
|
Logical space for |
Propagates resources to the |
|
Two CRs are created: |
|
| Contains OpenShift Container Platform image information such as the repository and image name. | Passed into resources to provide OpenShift Container Platform images. |
19.6. Recommended single-node OpenShift cluster configuration for vDU application workloads Copy linkLink copied to clipboard!
Use the following reference information to understand the single-node OpenShift configurations required to deploy virtual distributed unit (vDU) applications in the cluster. Configurations include cluster optimizations for high performance workloads, enabling workload partitioning, and minimizing the number of reboots required postinstallation.
19.6.1. Running low latency applications on OpenShift Container Platform Copy linkLink copied to clipboard!
OpenShift Container Platform enables low latency processing for applications running on commercial off-the-shelf (COTS) hardware by using several technologies and specialized hardware devices:
- Real-time kernel for RHCOS
- Ensures workloads are handled with a high degree of process determinism.
- CPU isolation
- Avoids CPU scheduling delays and ensures CPU capacity is available consistently.
- NUMA-aware topology management
- Aligns memory and huge pages with CPU and PCI devices to pin guaranteed container memory and huge pages to the non-uniform memory access (NUMA) node. Pod resources for all Quality of Service (QoS) classes stay on the same NUMA node. This decreases latency and improves performance of the node.
- Huge pages memory management
- Using huge page sizes improves system performance by reducing the amount of system resources required to access page tables.
- Precision timing synchronization using PTP
- Allows synchronization between nodes in the network with sub-microsecond accuracy.
19.6.2. Recommended cluster host requirements for vDU application workloads Copy linkLink copied to clipboard!
Running vDU application workloads requires a bare-metal host with sufficient resources to run OpenShift Container Platform services and production workloads.
Profile | vCPU | Memory | Storage |
---|---|---|---|
Minimum | 4 to 8 vCPU | 32GB of RAM | 120GB |
One vCPU equals one physical core. However, if you enable simultaneous multithreading (SMT), or Hyper-Threading, use the following formula to calculate the number of vCPUs that represent one physical core:
- (threads per core × cores) × sockets = vCPUs
The server must have a Baseboard Management Controller (BMC) when booting with virtual media.
19.6.3. Configuring host firmware for low latency and high performance Copy linkLink copied to clipboard!
Bare-metal hosts require the firmware to be configured before the host can be provisioned. The firmware configuration is dependent on the specific hardware and the particular requirements of your installation.
Procedure
-
Set the UEFI/BIOS Boot Mode to
UEFI
. - In the host boot sequence order, set Hard drive first.
Apply the specific firmware configuration for your hardware. The following table describes a representative firmware configuration for an Intel Xeon Skylake or Intel Cascade Lake server, based on the Intel FlexRAN 4G and 5G baseband PHY reference design.
ImportantThe exact firmware configuration depends on your specific hardware and network requirements. The following sample configuration is for illustrative purposes only.
Expand Table 19.7. Sample firmware configuration for an Intel Xeon Skylake or Cascade Lake server Firmware setting Configuration CPU Power and Performance Policy
Performance
Uncore Frequency Scaling
Disabled
Performance P-limit
Disabled
Enhanced Intel SpeedStep ® Tech
Enabled
Intel Configurable TDP
Enabled
Configurable TDP Level
Level 2
Intel® Turbo Boost Technology
Enabled
Energy Efficient Turbo
Disabled
Hardware P-States
Disabled
Package C-State
C0/C1 state
C1E
Disabled
Processor C6
Disabled
Enable global SR-IOV and VT-d settings in the firmware for the host. These settings are relevant to bare-metal environments.
19.6.4. Connectivity prerequisites for managed cluster networks Copy linkLink copied to clipboard!
Before you can install and provision a managed cluster with the zero touch provisioning (ZTP) GitOps pipeline, the managed cluster host must meet the following networking prerequisites:
- There must be bi-directional connectivity between the ZTP GitOps container in the hub cluster and the Baseboard Management Controller (BMC) of the target bare-metal host.
The managed cluster must be able to resolve and reach the API hostname of the hub hostname and
*.apps
hostname. Here is an example of the API hostname of the hub and*.apps
hostname:-
api.hub-cluster.internal.domain.com
-
console-openshift-console.apps.hub-cluster.internal.domain.com
-
The hub cluster must be able to resolve and reach the API and
*.apps
hostname of the managed cluster. Here is an example of the API hostname of the managed cluster and*.apps
hostname:-
api.sno-managed-cluster-1.internal.domain.com
-
console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com
-
19.6.5. Workload partitioning in single-node OpenShift with GitOps ZTP Copy linkLink copied to clipboard!
Workload partitioning configures OpenShift Container Platform services, cluster management workloads, and infrastructure pods to run on a reserved number of host CPUs.
To configure workload partitioning with GitOps ZTP, you specify cluster management CPU resources with the cpuset
field of the SiteConfig
custom resource (CR) and the reserved
field of the group PolicyGenTemplate
CR. The GitOps ZTP pipeline uses these values to populate the required fields in the workload partitioning MachineConfig
CR (cpuset
) and the PerformanceProfile
CR (reserved
) that configure the single-node OpenShift cluster.
For maximum performance, ensure that the reserved
and isolated
CPU sets do not share CPU cores across NUMA zones.
-
The workload partitioning
MachineConfig
CR pins the OpenShift Container Platform infrastructure pods to a definedcpuset
configuration. -
The
PerformanceProfile
CR pins the systemd services to the reserved CPUs.
The value for the reserved
field specified in the PerformanceProfile
CR must match the cpuset
field in the workload partitioning MachineConfig
CR.
19.6.6. Recommended installation-time cluster configurations Copy linkLink copied to clipboard!
The ZTP pipeline applies the following custom resources (CRs) during cluster installation. These configuration CRs ensure that the cluster meets the feature and performance requirements necessary for running a vDU application.
When using the ZTP GitOps plugin and SiteConfig
CRs for cluster deployment, the following MachineConfig
CRs are included by default.
Use the SiteConfig
extraManifests
filter to alter the CRs that are included by default. For more information, see Advanced managed cluster configuration with SiteConfig CRs.
19.6.6.1. Workload partitioning Copy linkLink copied to clipboard!
Single-node OpenShift clusters that run DU workloads require workload partitioning. This limits the cores allowed to run platform services, maximizing the CPU core for application payloads.
Workload partitioning can only be enabled during cluster installation. You cannot disable workload partitioning postinstallation. However, you can reconfigure workload partitioning by updating the cpu
value that you define in the performance profile, and in the related MachineConfig
custom resource (CR).
The base64-encoded CR that enables workload partitioning contains the CPU set that the management workloads are constrained to. Encode host-specific values for
crio.conf
andkubelet.conf
in base64. Adjust the content to match the CPU set that is specified in the cluster performance profile. It must match the number of cores in the cluster host.Recommended workload partitioning configuration
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
When configured in the cluster host, the contents of
/etc/crio/crio.conf.d/01-workload-partitioning
should look like this:[crio.runtime.workloads.management] activation_annotation = "target.workload.openshift.io/management" annotation_prefix = "resources.workload.openshift.io" resources = { "cpushares" = 0, "cpuset" = "0-1,52-53" }
[crio.runtime.workloads.management] activation_annotation = "target.workload.openshift.io/management" annotation_prefix = "resources.workload.openshift.io" resources = { "cpushares" = 0, "cpuset" = "0-1,52-53" }
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
cpuset
value varies based on the installation. If Hyper-Threading is enabled, specify both threads for each core. Thecpuset
value must match the reserved CPUs that you define in thespec.cpu.reserved
field in the performance profile.
When configured in the cluster, the contents of
/etc/kubernetes/openshift-workload-pinning
should look like this:{ "management": { "cpuset": "0-1,52-53" } }
{ "management": { "cpuset": "0-1,52-53"
1 } }
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
cpuset
must match thecpuset
value in/etc/crio/crio.conf.d/01-workload-partitioning
.
Verification
Check that the applications and cluster system CPU pinning is correct. Run the following commands:
Open a remote shell connection to the managed cluster:
oc debug node/example-sno-1
$ oc debug node/example-sno-1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the OpenShift infrastructure applications CPU pinning is correct:
pgrep ovn | while read i; do taskset -cp $i; done
sh-4.4# pgrep ovn | while read i; do taskset -cp $i; done
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the system applications CPU pinning is correct:
pgrep systemd | while read i; do taskset -cp $i; done
sh-4.4# pgrep systemd | while read i; do taskset -cp $i; done
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
pid 1's current affinity list: 0-1,52-53 pid 938's current affinity list: 0-1,52-53 pid 962's current affinity list: 0-1,52-53 pid 1197's current affinity list: 0-1,52-53
pid 1's current affinity list: 0-1,52-53 pid 938's current affinity list: 0-1,52-53 pid 962's current affinity list: 0-1,52-53 pid 1197's current affinity list: 0-1,52-53
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.6.6.2. Reduced platform management footprint Copy linkLink copied to clipboard!
To reduce the overall management footprint of the platform, a MachineConfig
custom resource (CR) is required that places all Kubernetes-specific mount points in a new namespace separate from the host operating system. The following base64-encoded example MachineConfig
CR illustrates this configuration.
Recommended container mount namespace configuration
19.6.6.3. SCTP Copy linkLink copied to clipboard!
Stream Control Transmission Protocol (SCTP) is a key protocol used in RAN applications. This MachineConfig
object adds the SCTP kernel module to the node to enable this protocol.
Recommended SCTP configuration
19.6.6.4. Accelerated container startup Copy linkLink copied to clipboard!
The following MachineConfig
CR configures core OpenShift processes and containers to use all available CPU cores during system startup and shutdown. This accelerates the system recovery during initial boot and reboots.
Recommended accelerated container startup configuration
19.6.6.5. Automatic kernel crash dumps with kdump Copy linkLink copied to clipboard!
The kdump
Linux kernel feature creates a kernel crash dump when the kernel crashes. The kdump
feature is enabled with the following MachineConfig
CRs.
Recommended MachineConfig
CR to remove ice driver from control plane kdump logs (05-kdump-config-master.yaml
)
Recommended control plane node kdump configuration (06-kdump-master.yaml
)
Recommended MachineConfig
CR to remove ice driver from worker node kdump logs (05-kdump-config-worker.yaml
)
Recommended kdump worker node configuration (06-kdump-worker.yaml
)
19.6.7. Recommended postinstallation cluster configurations Copy linkLink copied to clipboard!
When the cluster installation is complete, the ZTP pipeline applies the following custom resources (CRs) that are required to run DU workloads.
In {ztp} v4.10 and earlier, you configure UEFI secure boot with a MachineConfig
CR. This is no longer required in {ztp} v4.11 and later. In v4.11, you configure UEFI secure boot for single-node OpenShift clusters by updating the spec.clusters.nodes.bootMode
field in the SiteConfig
CR that you use to install the cluster. For more information, see Deploying a managed cluster with SiteConfig and {ztp}.
19.6.7.1. Operator namespaces and Operator groups Copy linkLink copied to clipboard!
Single-node OpenShift clusters that run DU workloads require the following OperatorGroup
and Namespace
custom resources (CRs):
- Local Storage Operator
- Logging Operator
- PTP Operator
- SR-IOV Network Operator
The following YAML summarizes these CRs:
Recommended Operator Namespace and OperatorGroup configuration
19.6.7.2. Operator subscriptions Copy linkLink copied to clipboard!
Single-node OpenShift clusters that run DU workloads require the following Subscription
CRs. The subscription provides the location to download the following Operators:
- Local Storage Operator
- Logging Operator
- PTP Operator
- SR-IOV Network Operator
Recommended Operator subscriptions
- 1
- Specify the channel to get the Operator from.
stable
is the recommended channel. - 2
- Specify
Manual
orAutomatic
. InAutomatic
mode, the Operator automatically updates to the latest versions in the channel as they become available in the registry. InManual
mode, new Operator versions are installed only after they are explicitly approved.
19.6.7.3. Cluster logging and log forwarding Copy linkLink copied to clipboard!
Single-node OpenShift clusters that run DU workloads require logging and log forwarding for debugging. The following example YAML illustrates the required ClusterLogging
and ClusterLogForwarder
CRs.
Recommended cluster logging and log forwarding configuration
19.6.7.4. Performance profile Copy linkLink copied to clipboard!
Single-node OpenShift clusters that run DU workloads require a Node Tuning Operator performance profile to use real-time host capabilities and services.
In earlier versions of OpenShift Container Platform, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In OpenShift Container Platform 4.11 and later, this functionality is part of the Node Tuning Operator.
The following example PerformanceProfile
CR illustrates the required cluster configuration.
Recommended performance profile configuration
- 1
- Ensure that the value for
name
matches that specified in thespec.profile.data
field ofTunedPerformancePatch.yaml
and thestatus.configuration.source.name
field ofvalidatorCRs/informDuValidator.yaml
. - 2
- Configures UEFI secure boot for the cluster host.
- 3
- Set the isolated CPUs. Ensure all of the Hyper-Threading pairs match.Important
The reserved and isolated CPU pools must not overlap and together must span all available cores. CPU cores that are not accounted for cause an undefined behaviour in the system.
- 4
- Set the reserved CPUs. When workload partitioning is enabled, system processes, kernel threads, and system container threads are restricted to these CPUs. All CPUs that are not isolated should be reserved.
- 5
- Set the number of huge pages.
- 6
- Set the huge page size.
- 7
- Set
node
to the NUMA node where thehugepages
are allocated. - 8
- Set
enabled
totrue
to install the real-time Linux kernel.
19.6.7.5. Configuring cluster time synchronization Copy linkLink copied to clipboard!
Run a one-time system time synchronization job for control plane or worker nodes.
Recommended one time time-sync for control plane nodes (99-sync-time-once-master.yaml
)
Recommended one time time-sync for worker nodes (99-sync-time-once-worker.yaml
)
19.6.7.6. PTP Copy linkLink copied to clipboard!
Single-node OpenShift clusters use Precision Time Protocol (PTP) for network time synchronization. The following example PtpConfig
CR illustrates the required PTP configuration for ordinary clocks. The exact configuration you apply will depend on the node hardware and specific use case.
Recommended PTP configuration
19.6.7.7. Extended Tuned profile Copy linkLink copied to clipboard!
Single-node OpenShift clusters that run DU workloads require additional performance tuning configurations necessary for high-performance workloads. The following example Tuned
CR extends the Tuned
profile:
Recommended extended Tuned profile configuration
19.6.7.8. SR-IOV Copy linkLink copied to clipboard!
Single root I/O virtualization (SR-IOV) is commonly used to enable the fronthaul and the midhaul networks. The following YAML example configures SR-IOV for a single-node OpenShift cluster.
Recommended SR-IOV configuration
- 1
- Specifies the VLAN for the midhaul network.
- 2
- Select either
vfio-pci
ornetdevice
, as needed. - 3
- Specifies the interface connected to the midhaul network.
- 4
- Specifies the number of VFs for the midhaul network.
- 5
- The VLAN for the fronthaul network.
- 6
- Select either
vfio-pci
ornetdevice
, as needed. - 7
- Specifies the interface connected to the fronthaul network.
- 8
- Specifies the number of VFs for the fronthaul network.
19.6.7.9. Console Operator Copy linkLink copied to clipboard!
The console-operator installs and maintains the web console on a cluster. When the node is centrally managed the Operator is not needed and makes space for application workloads. The following Console
custom resource (CR) example disables the console.
Recommended console configuration
19.6.7.10. Alertmanager Copy linkLink copied to clipboard!
Single-node OpenShift clusters that run DU workloads require reduced CPU resources consumed by the OpenShift Container Platform monitoring components. The following ConfigMap
custom resource (CR) disables Alertmanager.
Recommended cluster monitoring configuration
19.6.7.11. Operator Lifecycle Manager Copy linkLink copied to clipboard!
Single-node OpenShift clusters that run distributed unit workloads require consistent access to CPU resources. Operator Lifecycle Manager (OLM) collects performance data from Operators at regular intervals, resulting in an increase in CPU utilisation. The following ConfigMap
custom resource (CR) disables the collection of Operator performance data by OLM.
Recommended cluster OLM configuration (ReduceOLMFootprint.yaml
)
19.6.7.12. Network diagnostics Copy linkLink copied to clipboard!
Single-node OpenShift clusters that run DU workloads require less inter-pod network connectivity checks to reduce the additional load created by these pods. The following custom resource (CR) disables these checks.
Recommended network diagnostics configuration
19.7. Validating single-node OpenShift cluster tuning for vDU application workloads Copy linkLink copied to clipboard!
Before you can deploy virtual distributed unit (vDU) applications, you need to tune and configure the cluster host firmware and various other cluster configuration settings. Use the following information to validate the cluster configuration to support vDU workloads.
19.7.1. Recommended firmware configuration for vDU cluster hosts Copy linkLink copied to clipboard!
Use the following table as the basis to configure the cluster host firmware for vDU applications running on OpenShift Container Platform 4.12.
The following table is a general recommendation for vDU cluster host firmware configuration. Exact firmware settings will depend on your requirements and specific hardware platform. Automatic setting of firmware is not handled by the zero touch provisioning pipeline.
Firmware setting | Configuration | Description |
---|---|---|
HyperTransport (HT) | Enabled | HyperTransport (HT) bus is a bus technology developed by AMD. HT provides a high-speed link between the components in the host memory and other system peripherals. |
UEFI | Enabled | Enable booting from UEFI for the vDU host. |
CPU Power and Performance Policy | Performance | Set CPU Power and Performance Policy to optimize the system for performance over energy efficiency. |
Uncore Frequency Scaling | Disabled | Disable Uncore Frequency Scaling to prevent the voltage and frequency of non-core parts of the CPU from being set independently. |
Uncore Frequency | Maximum | Sets the non-core parts of the CPU such as cache and memory controller to their maximum possible frequency of operation. |
Performance P-limit | Disabled | Disable Performance P-limit to prevent the Uncore frequency coordination of processors. |
Enhanced Intel® SpeedStep Tech | Enabled | Enable Enhanced Intel SpeedStep to allow the system to dynamically adjust processor voltage and core frequency that decreases power consumption and heat production in the host. |
Intel® Turbo Boost Technology | Enabled | Enable Turbo Boost Technology for Intel-based CPUs to automatically allow processor cores to run faster than the rated operating frequency if they are operating below power, current, and temperature specification limits. |
Intel Configurable TDP | Enabled | Enables Thermal Design Power (TDP) for the CPU. |
Configurable TDP Level | Level 2 | TDP level sets the CPU power consumption required for a particular performance rating. TDP level 2 sets the CPU to the most stable performance level at the cost of power consumption. |
Energy Efficient Turbo | Disabled | Disable Energy Efficient Turbo to prevent the processor from using an energy-efficiency based policy. |
Hardware P-States | Enabled or Disabled |
Enable OS-controlled P-States to allow power saving configurations. Disable |
Package C-State | C0/C1 state | Use C0 or C1 states to set the processor to a fully active state (C0) or to stop CPU internal clocks running in software (C1). |
C1E | Disabled | CPU Enhanced Halt (C1E) is a power saving feature in Intel chips. Disabling C1E prevents the operating system from sending a halt command to the CPU when inactive. |
Processor C6 | Disabled | C6 power-saving is a CPU feature that automatically disables idle CPU cores and cache. Disabling C6 improves system performance. |
Sub-NUMA Clustering | Disabled | Sub-NUMA clustering divides the processor cores, cache, and memory into multiple NUMA domains. Disabling this option can increase performance for latency-sensitive workloads. |
Enable global SR-IOV and VT-d settings in the firmware for the host. These settings are relevant to bare-metal environments.
Enable both C-states
and OS-controlled P-States
to allow per pod power management.
19.7.2. Recommended cluster configurations to run vDU applications Copy linkLink copied to clipboard!
Clusters running virtualized distributed unit (vDU) applications require a highly tuned and optimized configuration. The following information describes the various elements that you require to support vDU workloads in OpenShift Container Platform 4.12 clusters.
19.7.2.1. Recommended cluster MachineConfig CRs Copy linkLink copied to clipboard!
Check that the MachineConfig
custom resources (CRs) that you extract from the ztp-site-generate
container are applied in the cluster. The CRs can be found in the extracted out/source-crs/extra-manifest/
folder.
The following MachineConfig
CRs from the ztp-site-generate
container configure the cluster host:
CR filename | Description |
---|---|
|
Configures workload partitioning for the cluster. Apply this |
|
Loads the SCTP kernel module. These |
| Configures the container mount namespace and Kubelet configuration. |
| Configures accelerated startup for the cluster. |
|
Configures |
19.7.2.2. Recommended cluster Operators Copy linkLink copied to clipboard!
The following Operators are required for clusters running virtualized distributed unit (vDU) applications and are a part of the baseline reference configuration:
- Node Tuning Operator (NTO). NTO packages functionality that was previously delivered with the Performance Addon Operator, which is now a part of NTO.
- PTP Operator
- SR-IOV Network Operator
- Red Hat OpenShift Logging Operator
- Local Storage Operator
19.7.2.3. Recommended cluster kernel configuration Copy linkLink copied to clipboard!
Always use the latest supported real-time kernel version in your cluster. Ensure that you apply the following configurations in the cluster:
Ensure that the following
additionalKernelArgs
are set in the cluster performance profile:spec: additionalKernelArgs: - "rcupdate.rcu_normal_after_boot=0" - "efi=runtime"
spec: additionalKernelArgs: - "rcupdate.rcu_normal_after_boot=0" - "efi=runtime"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure that the
performance-patch
profile in theTuned
CR configures the correct CPU isolation set that matches theisolated
CPU set in the relatedPerformanceProfile
CR, for example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Listed CPUs depend on the host hardware configuration, specifically the number of available CPUs in the system and the CPU topology.
19.7.2.4. Checking the realtime kernel version Copy linkLink copied to clipboard!
Always use the latest version of the realtime kernel in your OpenShift Container Platform clusters. If you are unsure about the kernel version that is in use in the cluster, you can compare the current realtime kernel version to the release version with the following procedure.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You are logged in as a user with
cluster-admin
privileges. -
You have installed
podman
.
Procedure
Run the following command to get the cluster version:
OCP_VERSION=$(oc get clusterversion version -o jsonpath='{.status.desired.version}{"\n"}')
$ OCP_VERSION=$(oc get clusterversion version -o jsonpath='{.status.desired.version}{"\n"}')
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the release image SHA number:
DTK_IMAGE=$(oc adm release info --image-for=driver-toolkit quay.io/openshift-release-dev/ocp-release:$OCP_VERSION-x86_64)
$ DTK_IMAGE=$(oc adm release info --image-for=driver-toolkit quay.io/openshift-release-dev/ocp-release:$OCP_VERSION-x86_64)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the release image container and extract the kernel version that is packaged with cluster’s current release:
podman run --rm $DTK_IMAGE rpm -qa | grep 'kernel-rt-core-' | sed 's#kernel-rt-core-##'
$ podman run --rm $DTK_IMAGE rpm -qa | grep 'kernel-rt-core-' | sed 's#kernel-rt-core-##'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
4.18.0-305.49.1.rt7.121.el8_4.x86_64
4.18.0-305.49.1.rt7.121.el8_4.x86_64
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This is the default realtime kernel version that ships with the release.
NoteThe realtime kernel is denoted by the string
.rt
in the kernel version.
Verification
Check that the kernel version listed for the cluster’s current release matches actual realtime kernel that is running in the cluster. Run the following commands to check the running realtime kernel version:
Open a remote shell connection to the cluster node:
oc debug node/<node_name>
$ oc debug node/<node_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the realtime kernel version:
uname -r
sh-4.4# uname -r
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
4.18.0-305.49.1.rt7.121.el8_4.x86_64
4.18.0-305.49.1.rt7.121.el8_4.x86_64
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.7.3. Checking that the recommended cluster configurations are applied Copy linkLink copied to clipboard!
You can check that clusters are running the correct configuration. The following procedure describes how to check the various configurations that you require to deploy a DU application in OpenShift Container Platform 4.12 clusters.
Prerequisites
- You have deployed a cluster and tuned it for vDU workloads.
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges.
Procedure
Check that the default OperatorHub sources are disabled. Run the following command:
oc get operatorhub cluster -o yaml
$ oc get operatorhub cluster -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
spec: disableAllDefaultSources: true
spec: disableAllDefaultSources: true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that all required
CatalogSource
resources are annotated for workload partitioning (PreferredDuringScheduling
) by running the following command:oc get catalogsource -A -o jsonpath='{range .items[*]}{.metadata.name}{" -- "}{.metadata.annotations.target\.workload\.openshift\.io/management}{"\n"}{end}'
$ oc get catalogsource -A -o jsonpath='{range .items[*]}{.metadata.name}{" -- "}{.metadata.annotations.target\.workload\.openshift\.io/management}{"\n"}{end}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
certified-operators -- {"effect": "PreferredDuringScheduling"} community-operators -- {"effect": "PreferredDuringScheduling"} ran-operators redhat-marketplace -- {"effect": "PreferredDuringScheduling"} redhat-operators -- {"effect": "PreferredDuringScheduling"}
certified-operators -- {"effect": "PreferredDuringScheduling"} community-operators -- {"effect": "PreferredDuringScheduling"} ran-operators
1 redhat-marketplace -- {"effect": "PreferredDuringScheduling"} redhat-operators -- {"effect": "PreferredDuringScheduling"}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
CatalogSource
resources that are not annotated are also returned. In this example, theran-operators
CatalogSource
resource is not annotated and does not have thePreferredDuringScheduling
annotation.
NoteIn a properly configured vDU cluster, only a single annotated catalog source is listed.
Check that all applicable OpenShift Container Platform Operator namespaces are annotated for workload partitioning. This includes all Operators installed with core OpenShift Container Platform and the set of additional Operators included in the reference DU tuning configuration. Run the following command:
oc get namespaces -A -o jsonpath='{range .items[*]}{.metadata.name}{" -- "}{.metadata.annotations.workload\.openshift\.io/allowed}{"\n"}{end}'
$ oc get namespaces -A -o jsonpath='{range .items[*]}{.metadata.name}{" -- "}{.metadata.annotations.workload\.openshift\.io/allowed}{"\n"}{end}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
default -- openshift-apiserver -- management openshift-apiserver-operator -- management openshift-authentication -- management openshift-authentication-operator -- management
default -- openshift-apiserver -- management openshift-apiserver-operator -- management openshift-authentication -- management openshift-authentication-operator -- management
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantAdditional Operators must not be annotated for workload partitioning. In the output from the previous command, additional Operators should be listed without any value on the right side of the
--
separator.Check that the
ClusterLogging
configuration is correct. Run the following commands:Validate that the appropriate input and output logs are configured:
oc get -n openshift-logging ClusterLogForwarder instance -o yaml
$ oc get -n openshift-logging ClusterLogForwarder instance -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the curation schedule is appropriate for your application:
oc get -n openshift-logging clusterloggings.logging.openshift.io instance -o yaml
$ oc get -n openshift-logging clusterloggings.logging.openshift.io instance -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Check that the web console is disabled (
managementState: Removed
) by running the following command:oc get consoles.operator.openshift.io cluster -o jsonpath="{ .spec.managementState }"
$ oc get consoles.operator.openshift.io cluster -o jsonpath="{ .spec.managementState }"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Removed
Removed
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that
chronyd
is disabled on the cluster node by running the following commands:oc debug node/<node_name>
$ oc debug node/<node_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of
chronyd
on the node:chroot /host
sh-4.4# chroot /host
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl status chronyd
sh-4.4# systemctl status chronyd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
● chronyd.service - NTP client/server Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:chronyd(8) man:chrony.conf(5)
● chronyd.service - NTP client/server Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:chronyd(8) man:chrony.conf(5)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the PTP interface is successfully synchronized to the primary clock using a remote shell connection to the
linuxptp-daemon
container and the PTP Management Client (pmc
) tool:Set the
$PTP_POD_NAME
variable with the name of thelinuxptp-daemon
pod by running the following command:PTP_POD_NAME=$(oc get pods -n openshift-ptp -l app=linuxptp-daemon -o name)
$ PTP_POD_NAME=$(oc get pods -n openshift-ptp -l app=linuxptp-daemon -o name)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the following command to check the sync status of the PTP device:
oc -n openshift-ptp rsh -c linuxptp-daemon-container ${PTP_POD_NAME} pmc -u -f /var/run/ptp4l.0.config -b 0 'GET PORT_DATA_SET'
$ oc -n openshift-ptp rsh -c linuxptp-daemon-container ${PTP_POD_NAME} pmc -u -f /var/run/ptp4l.0.config -b 0 'GET PORT_DATA_SET'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the following
pmc
command to check the PTP clock status:oc -n openshift-ptp rsh -c linuxptp-daemon-container ${PTP_POD_NAME} pmc -u -f /var/run/ptp4l.0.config -b 0 'GET TIME_STATUS_NP'
$ oc -n openshift-ptp rsh -c linuxptp-daemon-container ${PTP_POD_NAME} pmc -u -f /var/run/ptp4l.0.config -b 0 'GET TIME_STATUS_NP'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the expected
master offset
value corresponding to the value in/var/run/ptp4l.0.config
is found in thelinuxptp-daemon-container
log:oc logs $PTP_POD_NAME -n openshift-ptp -c linuxptp-daemon-container
$ oc logs $PTP_POD_NAME -n openshift-ptp -c linuxptp-daemon-container
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
phc2sys[56020.341]: [ptp4l.1.config] CLOCK_REALTIME phc offset -1731092 s2 freq -1546242 delay 497 ptp4l[56020.390]: [ptp4l.1.config] master offset -2 s2 freq -5863 path delay 541 ptp4l[56020.390]: [ptp4l.0.config] master offset -8 s2 freq -10699 path delay 533
phc2sys[56020.341]: [ptp4l.1.config] CLOCK_REALTIME phc offset -1731092 s2 freq -1546242 delay 497 ptp4l[56020.390]: [ptp4l.1.config] master offset -2 s2 freq -5863 path delay 541 ptp4l[56020.390]: [ptp4l.0.config] master offset -8 s2 freq -10699 path delay 533
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Check that the SR-IOV configuration is correct by running the following commands:
Check that the
disableDrain
value in theSriovOperatorConfig
resource is set totrue
:oc get sriovoperatorconfig -n openshift-sriov-network-operator default -o jsonpath="{.spec.disableDrain}{'\n'}"
$ oc get sriovoperatorconfig -n openshift-sriov-network-operator default -o jsonpath="{.spec.disableDrain}{'\n'}"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
true
true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the
SriovNetworkNodeState
sync status isSucceeded
by running the following command:oc get SriovNetworkNodeStates -n openshift-sriov-network-operator -o jsonpath="{.items[*].status.syncStatus}{'\n'}"
$ oc get SriovNetworkNodeStates -n openshift-sriov-network-operator -o jsonpath="{.items[*].status.syncStatus}{'\n'}"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Succeeded
Succeeded
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the expected number and configuration of virtual functions (
Vfs
) under each interface configured for SR-IOV is present and correct in the.status.interfaces
field. For example:oc get SriovNetworkNodeStates -n openshift-sriov-network-operator -o yaml
$ oc get SriovNetworkNodeStates -n openshift-sriov-network-operator -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Check that the cluster performance profile is correct. The
cpu
andhugepages
sections will vary depending on your hardware configuration. Run the following command:oc get PerformanceProfile openshift-node-performance-profile -o yaml
$ oc get PerformanceProfile openshift-node-performance-profile -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteCPU settings are dependent on the number of cores available on the server and should align with workload partitioning settings.
hugepages
configuration is server and application dependent.Check that the
PerformanceProfile
was successfully applied to the cluster by running the following command:oc get performanceprofile openshift-node-performance-profile -o jsonpath="{range .status.conditions[*]}{ @.type }{' -- '}{@.status}{'\n'}{end}"
$ oc get performanceprofile openshift-node-performance-profile -o jsonpath="{range .status.conditions[*]}{ @.type }{' -- '}{@.status}{'\n'}{end}"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Available -- True Upgradeable -- True Progressing -- False Degraded -- False
Available -- True Upgradeable -- True Progressing -- False Degraded -- False
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
Tuned
performance patch settings by running the following command:oc get tuneds.tuned.openshift.io -n openshift-cluster-node-tuning-operator performance-patch -o yaml
$ oc get tuneds.tuned.openshift.io -n openshift-cluster-node-tuning-operator performance-patch -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The cpu list in
cmdline=nohz_full=
will vary based on your hardware configuration.
Check that cluster networking diagnostics are disabled by running the following command:
oc get networks.operator.openshift.io cluster -o jsonpath='{.spec.disableNetworkDiagnostics}'
$ oc get networks.operator.openshift.io cluster -o jsonpath='{.spec.disableNetworkDiagnostics}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
true
true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the
Kubelet
housekeeping interval is tuned to slower rate. This is set in thecontainerMountNS
machine config. Run the following command:oc describe machineconfig container-mount-namespace-and-kubelet-conf-master | grep OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION
$ oc describe machineconfig container-mount-namespace-and-kubelet-conf-master | grep OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s"
Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that Grafana and
alertManagerMain
are disabled and that the Prometheus retention period is set to 24h by running the following command:oc get configmap cluster-monitoring-config -n openshift-monitoring -o jsonpath="{ .data.config\.yaml }"
$ oc get configmap cluster-monitoring-config -n openshift-monitoring -o jsonpath="{ .data.config\.yaml }"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the following commands to verify that Grafana and
alertManagerMain
routes are not found in the cluster:oc get route -n openshift-monitoring alertmanager-main
$ oc get route -n openshift-monitoring alertmanager-main
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get route -n openshift-monitoring grafana
$ oc get route -n openshift-monitoring grafana
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Both queries should return
Error from server (NotFound)
messages.
Check that there is a minimum of 4 CPUs allocated as
reserved
for each of thePerformanceProfile
,Tuned
performance-patch, workload partitioning, and kernel command-line arguments by running the following command:oc get performanceprofile -o jsonpath="{ .items[0].spec.cpu.reserved }"
$ oc get performanceprofile -o jsonpath="{ .items[0].spec.cpu.reserved }"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
0-3
0-3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteDepending on your workload requirements, you might require additional reserved CPUs to be allocated.
19.8. Advanced managed cluster configuration with SiteConfig resources Copy linkLink copied to clipboard!
You can use SiteConfig
custom resources (CRs) to deploy custom functionality and configurations in your managed clusters at installation time.
19.8.1. Customizing extra installation manifests in the ZTP GitOps pipeline Copy linkLink copied to clipboard!
You can define a set of extra manifests for inclusion in the installation phase of the zero touch provisioning (ZTP) GitOps pipeline. These manifests are linked to the SiteConfig
custom resources (CRs) and are applied to the cluster during installation. Including MachineConfig
CRs at install time makes the installation process more efficient.
Prerequisites
- Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
Procedure
- Create a set of extra manifest CRs that the ZTP pipeline uses to customize the cluster installs.
In your custom
/siteconfig
directory, create an/extra-manifest
folder for your extra manifests. The following example illustrates a sample/siteconfig
with/extra-manifest
folder:siteconfig ├── site1-sno-du.yaml ├── site2-standard-du.yaml └── extra-manifest └── 01-example-machine-config.yaml
siteconfig ├── site1-sno-du.yaml ├── site2-standard-du.yaml └── extra-manifest └── 01-example-machine-config.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Add your custom extra manifest CRs to the
siteconfig/extra-manifest
directory. In your
SiteConfig
CR, enter the directory name in theextraManifestPath
field, for example:clusters: - clusterName: "example-sno" networkType: "OVNKubernetes" extraManifestPath: extra-manifest
clusters: - clusterName: "example-sno" networkType: "OVNKubernetes" extraManifestPath: extra-manifest
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Save the
SiteConfig
CRs and/extra-manifest
CRs and push them to the site configuration repo.
The ZTP pipeline appends the CRs in the /extra-manifest
directory to the default set of extra manifests during cluster provisioning.
19.8.2. Filtering custom resources using SiteConfig filters Copy linkLink copied to clipboard!
By using filters, you can easily customize SiteConfig
custom resources (CRs) to include or exclude other CRs for use in the installation phase of the zero touch provisioning (ZTP) GitOps pipeline.
You can specify an inclusionDefault
value of include
or exclude
for the SiteConfig
CR, along with a list of the specific extraManifest
RAN CRs that you want to include or exclude. Setting inclusionDefault
to include
makes the ZTP pipeline apply all the files in /source-crs/extra-manifest
during installation. Setting inclusionDefault
to exclude
does the opposite.
You can exclude individual CRs from the /source-crs/extra-manifest
folder that are otherwise included by default. The following example configures a custom single-node OpenShift SiteConfig
CR to exclude the /source-crs/extra-manifest/03-sctp-machine-config-worker.yaml
CR at installation time.
Some additional optional filtering scenarios are also described.
Prerequisites
- You configured the hub cluster for generating the required installation and policy CRs.
- You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
Procedure
To prevent the ZTP pipeline from applying the
03-sctp-machine-config-worker.yaml
CR file, apply the following YAML in theSiteConfig
CR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow The ZTP pipeline skips the
03-sctp-machine-config-worker.yaml
CR during installation. All other CRs in/source-crs/extra-manifest
are applied.Save the
SiteConfig
CR and push the changes to the site configuration repository.The ZTP pipeline monitors and adjusts what CRs it applies based on the
SiteConfig
filter instructions.Optional: To prevent the ZTP pipeline from applying all the
/source-crs/extra-manifest
CRs during cluster installation, apply the following YAML in theSiteConfig
CR:- clusterName: "site1-sno-du" extraManifests: filter: inclusionDefault: exclude
- clusterName: "site1-sno-du" extraManifests: filter: inclusionDefault: exclude
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: To exclude all the
/source-crs/extra-manifest
RAN CRs and instead include a custom CR file during installation, edit the customSiteConfig
CR to set the custom manifests folder and theinclude
file, for example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following example illustrates the custom folder structure:
siteconfig ├── site1-sno-du.yaml └── user-custom-manifest └── custom-sctp-machine-config-worker.yaml
siteconfig ├── site1-sno-du.yaml └── user-custom-manifest └── custom-sctp-machine-config-worker.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.9. Advanced managed cluster configuration with PolicyGenTemplate resources Copy linkLink copied to clipboard!
You can use PolicyGenTemplate
CRs to deploy custom functionality in your managed clusters.
19.9.1. Deploying additional changes to clusters Copy linkLink copied to clipboard!
If you require cluster configuration changes outside of the base GitOps ZTP pipeline configuration, there are three options:
- Apply the additional configuration after the ZTP pipeline is complete
- When the GitOps ZTP pipeline deployment is complete, the deployed cluster is ready for application workloads. At this point, you can install additional Operators and apply configurations specific to your requirements. Ensure that additional configurations do not negatively affect the performance of the platform or allocated CPU budget.
- Add content to the ZTP library
- The base source custom resources (CRs) that you deploy with the GitOps ZTP pipeline can be augmented with custom content as required.
- Create extra manifests for the cluster installation
- Extra manifests are applied during installation and make the installation process more efficient.
Providing additional source CRs or modifying existing source CRs can significantly impact the performance or CPU profile of OpenShift Container Platform.
19.9.2. Using PolicyGenTemplate CRs to override source CRs content Copy linkLink copied to clipboard!
PolicyGenTemplate
custom resources (CRs) allow you to overlay additional configuration details on top of the base source CRs provided with the GitOps plugin in the ztp-site-generate
container. You can think of PolicyGenTemplate
CRs as a logical merge or patch to the base CR. Use PolicyGenTemplate
CRs to update a single field of the base CR, or overlay the entire contents of the base CR. You can update values and insert fields that are not in the base CR.
The following example procedure describes how to update fields in the generated PerformanceProfile
CR for the reference configuration based on the PolicyGenTemplate
CR in the group-du-sno-ranGen.yaml
file. Use the procedure as a basis for modifying other parts of the PolicyGenTemplate
based on your requirements.
Prerequisites
- Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for Argo CD.
Procedure
Review the baseline source CR for existing content. You can review the source CRs listed in the reference
PolicyGenTemplate
CRs by extracting them from the zero touch provisioning (ZTP) container.Create an
/out
folder:mkdir -p ./out
$ mkdir -p ./out
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Extract the source CRs:
podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12.1 extract /home/ztp --tar | tar x -C ./out
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12.1 extract /home/ztp --tar | tar x -C ./out
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Review the baseline
PerformanceProfile
CR in./out/source-crs/PerformanceProfile.yaml
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteAny fields in the source CR which contain
$…
are removed from the generated CR if they are not provided in thePolicyGenTemplate
CR.Update the
PolicyGenTemplate
entry forPerformanceProfile
in thegroup-du-sno-ranGen.yaml
reference file. The following examplePolicyGenTemplate
CR stanza supplies appropriate CPU specifications, sets thehugepages
configuration, and adds a new field that setsgloballyDisableIrqLoadBalancing
to false.Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Commit the
PolicyGenTemplate
change in Git, and then push to the Git repository being monitored by the GitOps ZTP argo CD application.
Example output
The ZTP application generates an RHACM policy that contains the generated PerformanceProfile
CR. The contents of that CR are derived by merging the metadata
and spec
contents from the PerformanceProfile
entry in the PolicyGenTemplate
onto the source CR. The resulting CR has the following content:
In the /source-crs
folder that you extract from the ztp-site-generate
container, the $
syntax is not used for template substitution as implied by the syntax. Rather, if the policyGen
tool sees the $
prefix for a string and you do not specify a value for that field in the related PolicyGenTemplate
CR, the field is omitted from the output CR entirely.
An exception to this is the $mcp
variable in /source-crs
YAML files that is substituted with the specified value for mcp
from the PolicyGenTemplate
CR. For example, in example/policygentemplates/group-du-standard-ranGen.yaml
, the value for mcp
is worker
:
spec: bindingRules: group-du-standard: "" mcp: "worker"
spec:
bindingRules:
group-du-standard: ""
mcp: "worker"
The policyGen
tool replace instances of $mcp
with worker
in the output CRs.
19.9.3. Adding custom content to the GitOps ZTP pipeline Copy linkLink copied to clipboard!
Perform the following procedure to add new content to the ZTP pipeline.
Procedure
-
Create a subdirectory named
source-crs
in the directory that contains thekustomization.yaml
file for thePolicyGenTemplate
custom resource (CR). Add your custom CRs to the
source-crs
subdirectory, as shown in the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
source-crs
subdirectory must be in the same directory as thekustomization.yaml
file.
ImportantTo use your own resources, ensure that the custom CR names differ from the default source CRs provided in the ZTP container.
Update the required
PolicyGenTemplate
CRs to include references to the content you added in thesource-crs/custom-crs
andsource-crs/elasticsearch
directories. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Commit the
PolicyGenTemplate
change in Git, and then push to the Git repository that is monitored by the GitOps ZTP Argo CD policies application. Update the
ClusterGroupUpgrade
CR to include the changedPolicyGenTemplate
and save it ascgu-test.yaml
. The following example shows a generatedcgu-test.yaml
file.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the updated
ClusterGroupUpgrade
CR by running the following command:oc apply -f cgu-test.yaml
$ oc apply -f cgu-test.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check that the updates have succeeded by running the following command:
oc get cgu -A
$ oc get cgu -A
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAMESPACE NAME AGE STATE DETAILS ztp-clusters custom-source-cr 6s InProgress Remediating non-compliant policies ztp-install cluster1 19h Completed All clusters are compliant with all the managed policies
NAMESPACE NAME AGE STATE DETAILS ztp-clusters custom-source-cr 6s InProgress Remediating non-compliant policies ztp-install cluster1 19h Completed All clusters are compliant with all the managed policies
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.9.4. Configuring policy compliance evaluation timeouts for PolicyGenTemplate CRs Copy linkLink copied to clipboard!
Use Red Hat Advanced Cluster Management (RHACM) installed on a hub cluster to monitor and report on whether your managed clusters are compliant with applied policies. RHACM uses policy templates to apply predefined policy controllers and policies. Policy controllers are Kubernetes custom resource definition (CRD) instances.
You can override the default policy evaluation intervals with PolicyGenTemplate
custom resources (CRs). You configure duration settings that define how long a ConfigurationPolicy
CR can be in a state of policy compliance or non-compliance before RHACM re-evaluates the applied cluster policies.
The zero touch provisioning (ZTP) policy generator generates ConfigurationPolicy
CR policies with pre-defined policy evaluation intervals. The default value for the noncompliant
state is 10 seconds. The default value for the compliant
state is 10 minutes. To disable the evaluation interval, set the value to never
.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data.
Procedure
To configure the evaluation interval for all policies in a
PolicyGenTemplate
CR, addevaluationInterval
to thespec
field, and then set the appropriatecompliant
andnoncompliant
values. For example:spec: evaluationInterval: compliant: 30m noncompliant: 20s
spec: evaluationInterval: compliant: 30m noncompliant: 20s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To configure the evaluation interval for the
spec.sourceFiles
object in aPolicyGenTemplate
CR, addevaluationInterval
to thesourceFiles
field, for example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Commit the
PolicyGenTemplate
CRs files in the Git repository and push your changes.
Verification
Check that the managed spoke cluster policies are monitored at the expected intervals.
-
Log in as a user with
cluster-admin
privileges on the managed cluster. Get the pods that are running in the
open-cluster-management-agent-addon
namespace. Run the following command:oc get pods -n open-cluster-management-agent-addon
$ oc get pods -n open-cluster-management-agent-addon
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE config-policy-controller-858b894c68-v4xdb 1/1 Running 22 (5d8h ago) 10d
NAME READY STATUS RESTARTS AGE config-policy-controller-858b894c68-v4xdb 1/1 Running 22 (5d8h ago) 10d
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the applied policies are being evaluated at the expected interval in the logs for the
config-policy-controller
pod:oc logs -n open-cluster-management-agent-addon config-policy-controller-858b894c68-v4xdb
$ oc logs -n open-cluster-management-agent-addon config-policy-controller-858b894c68-v4xdb
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
2022-05-10T15:10:25.280Z info configuration-policy-controller controllers/configurationpolicy_controller.go:166 Skipping the policy evaluation due to the policy not reaching the evaluation interval {"policy": "compute-1-config-policy-config"} 2022-05-10T15:10:25.280Z info configuration-policy-controller controllers/configurationpolicy_controller.go:166 Skipping the policy evaluation due to the policy not reaching the evaluation interval {"policy": "compute-1-common-compute-1-catalog-policy-config"}
2022-05-10T15:10:25.280Z info configuration-policy-controller controllers/configurationpolicy_controller.go:166 Skipping the policy evaluation due to the policy not reaching the evaluation interval {"policy": "compute-1-config-policy-config"} 2022-05-10T15:10:25.280Z info configuration-policy-controller controllers/configurationpolicy_controller.go:166 Skipping the policy evaluation due to the policy not reaching the evaluation interval {"policy": "compute-1-common-compute-1-catalog-policy-config"}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.9.5. Signalling ZTP cluster deployment completion with validator inform policies Copy linkLink copied to clipboard!
Create a validator inform policy that signals when the zero touch provisioning (ZTP) installation and configuration of the deployed cluster is complete. This policy can be used for deployments of single-node OpenShift clusters, three-node clusters, and standard clusters.
Procedure
Create a standalone
PolicyGenTemplate
custom resource (CR) that contains the source filevalidatorCRs/informDuValidator.yaml
. You only need one standalonePolicyGenTemplate
CR for each cluster type. For example, this CR applies a validator inform policy for single-node OpenShift clusters:Example single-node cluster validator inform policy CR (group-du-sno-validator-ranGen.yaml)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The name of
PolicyGenTemplates
object. This name is also used as part of the names for theplacementBinding
,placementRule
, andpolicy
that are created in the requestednamespace
. - 2
- This value should match the
namespace
used in the groupPolicyGenTemplates
. - 3
- The
group-du-*
label defined inbindingRules
must exist in theSiteConfig
files. - 4
- The label defined in
bindingExcludedRules
must be`ztp-done:`. Theztp-done
label is used in coordination with the Topology Aware Lifecycle Manager. - 5
mcp
defines theMachineConfigPool
object that is used in the source filevalidatorCRs/informDuValidator.yaml
. It should bemaster
for single node and three-node cluster deployments andworker
for standard cluster deployments.- 6
- Optional. The default value is
inform
. - 7
- This value is used as part of the name for the generated RHACM policy. The generated validator policy for the single node example is
group-du-sno-validator-du-policy
.
-
Commit the
PolicyGenTemplate
CR file in your Git repository and push the changes.
19.9.6. Configuring PTP events with PolicyGenTemplate CRs Copy linkLink copied to clipboard!
You can use the GitOps ZTP pipeline to configure PTP events that use HTTP or AMQP transport.
HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.
19.9.6.1. Configuring PTP events that use HTTP transport Copy linkLink copied to clipboard!
You can configure PTP events that use HTTP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data.
Procedure
Apply the following
PolicyGenTemplate
changes togroup-du-3node-ranGen.yaml
,group-du-sno-ranGen.yaml
, orgroup-du-standard-ranGen.yaml
files according to your requirements:In
.sourceFiles
, add thePtpOperatorConfig
CR file that configures the transport host:Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIn OpenShift Container Platform 4.12 or later, you do not need to set the
transportHost
field in thePtpOperatorConfig
resource when you use HTTP transport with PTP events.Configure the
linuxptp
andphc2sys
for the PTP clock type and interface. For example, add the following stanza into.sourceFiles
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Can be one of
PtpConfigMaster.yaml
,PtpConfigSlave.yaml
, orPtpConfigSlaveCvl.yaml
depending on your requirements.PtpConfigSlaveCvl.yaml
configureslinuxptp
services for an Intel E810 Columbiaville NIC. For configurations based ongroup-du-sno-ranGen.yaml
orgroup-du-3node-ranGen.yaml
, usePtpConfigSlave.yaml
. - 2
- Device specific interface name.
- 3
- You must append the
--summary_interval -4
value toptp4lOpts
in.spec.sourceFiles.spec.profile
to enable PTP fast events. - 4
- Required
phc2sysOpts
values.-m
prints messages tostdout
. Thelinuxptp-daemon
DaemonSet
parses the logs and generates Prometheus metrics. - 5
- Optional. If the
ptpClockThreshold
stanza is not present, default values are used for theptpClockThreshold
fields. The stanza shows defaultptpClockThreshold
values. TheptpClockThreshold
values configure how long after the PTP master clock is disconnected before PTP events are triggered.holdOverTimeout
is the time value in seconds before the PTP clock event state changes toFREERUN
when the PTP master clock is disconnected. ThemaxOffsetThreshold
andminOffsetThreshold
settings configure offset values in nanoseconds that compare against the values forCLOCK_REALTIME
(phc2sys
) or master offset (ptp4l
). When theptp4l
orphc2sys
offset value is outside this range, the PTP clock state is set toFREERUN
. When the offset value is within this range, the PTP clock state is set toLOCKED
.
- Merge any other required changes and files with your custom site repository.
- Push the changes to your site configuration repository to deploy PTP fast events to new sites using GitOps ZTP.
19.9.6.2. Configuring PTP events that use AMQP transport Copy linkLink copied to clipboard!
You can configure PTP events that use AMQP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.
HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data.
Procedure
Add the following YAML into
.spec.sourceFiles
in thecommon-ranGen.yaml
file to configure the AMQP Operator:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the following
PolicyGenTemplate
changes togroup-du-3node-ranGen.yaml
,group-du-sno-ranGen.yaml
, orgroup-du-standard-ranGen.yaml
files according to your requirements:In
.sourceFiles
, add thePtpOperatorConfig
CR file that configures the AMQ transport host to theconfig-policy
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Configure the
linuxptp
andphc2sys
for the PTP clock type and interface. For example, add the following stanza into.sourceFiles
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Can be one
PtpConfigMaster.yaml
,PtpConfigSlave.yaml
, orPtpConfigSlaveCvl.yaml
depending on your requirements.PtpConfigSlaveCvl.yaml
configureslinuxptp
services for an Intel E810 Columbiaville NIC. For configurations based ongroup-du-sno-ranGen.yaml
orgroup-du-3node-ranGen.yaml
, usePtpConfigSlave.yaml
. - 2
- Device specific interface name.
- 3
- You must append the
--summary_interval -4
value toptp4lOpts
in.spec.sourceFiles.spec.profile
to enable PTP fast events. - 4
- Required
phc2sysOpts
values.-m
prints messages tostdout
. Thelinuxptp-daemon
DaemonSet
parses the logs and generates Prometheus metrics. - 5
- Optional. If the
ptpClockThreshold
stanza is not present, default values are used for theptpClockThreshold
fields. The stanza shows defaultptpClockThreshold
values. TheptpClockThreshold
values configure how long after the PTP master clock is disconnected before PTP events are triggered.holdOverTimeout
is the time value in seconds before the PTP clock event state changes toFREERUN
when the PTP master clock is disconnected. ThemaxOffsetThreshold
andminOffsetThreshold
settings configure offset values in nanoseconds that compare against the values forCLOCK_REALTIME
(phc2sys
) or master offset (ptp4l
). When theptp4l
orphc2sys
offset value is outside this range, the PTP clock state is set toFREERUN
. When the offset value is within this range, the PTP clock state is set toLOCKED
.
Apply the following
PolicyGenTemplate
changes to your specific site YAML files, for example,example-sno-site.yaml
:In
.sourceFiles
, add theInterconnect
CR file that configures the AMQ router to theconfig-policy
:- fileName: AmqInstance.yaml policyName: "config-policy"
- fileName: AmqInstance.yaml policyName: "config-policy"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Merge any other required changes and files with your custom site repository.
- Push the changes to your site configuration repository to deploy PTP fast events to new sites using GitOps ZTP.
19.9.7. Configuring bare-metal events with PolicyGenTemplate CRs Copy linkLink copied to clipboard!
You can use the GitOps ZTP pipeline to configure bare-metal events that use HTTP or AMQP transport.
HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.
19.9.7.1. Configuring bare-metal events that use HTTP transport Copy linkLink copied to clipboard!
You can configure bare-metal events that use HTTP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data.
Procedure
Configure the Bare Metal Event Relay Operator by adding the following YAML to
spec.sourceFiles
in thecommon-ranGen.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the
HardwareEvent
CR tospec.sourceFiles
in your specific group configuration file, for example, in thegroup-du-sno-ranGen.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Each baseboard management controller (BMC) requires a single
HardwareEvent
CR only.
NoteIn OpenShift Container Platform 4.12 or later, you do not need to set the
transportHost
field in theHardwareEvent
custom resource (CR) when you use HTTP transport with bare-metal events.- Merge any other required changes and files with your custom site repository.
- Push the changes to your site configuration repository to deploy bare-metal events to new sites with GitOps ZTP.
Create the Redfish Secret by running the following command:
oc -n openshift-bare-metal-events create secret generic redfish-basic-auth \ --from-literal=username=<bmc_username> --from-literal=password=<bmc_password> \ --from-literal=hostaddr="<bmc_host_ip_addr>"
$ oc -n openshift-bare-metal-events create secret generic redfish-basic-auth \ --from-literal=username=<bmc_username> --from-literal=password=<bmc_password> \ --from-literal=hostaddr="<bmc_host_ip_addr>"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.9.7.2. Configuring bare-metal events that use AMQP transport Copy linkLink copied to clipboard!
You can configure bare-metal events that use AMQP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.
HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data.
Procedure
To configure the AMQ Interconnect Operator and the Bare Metal Event Relay Operator, add the following YAML to
spec.sourceFiles
in thecommon-ranGen.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the
Interconnect
CR to.spec.sourceFiles
in the site configuration file, for example, theexample-sno-site.yaml
file:- fileName: AmqInstance.yaml policyName: "config-policy"
- fileName: AmqInstance.yaml policyName: "config-policy"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the
HardwareEvent
CR tospec.sourceFiles
in your specific group configuration file, for example, in thegroup-du-sno-ranGen.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
transportHost
URL is composed of the existing AMQ Interconnect CRname
andnamespace
. For example, intransportHost: "amqp://amq-router.amq-router.svc.cluster.local"
, the AMQ Interconnectname
andnamespace
are both set toamq-router
.
NoteEach baseboard management controller (BMC) requires a single
HardwareEvent
resource only.-
Commit the
PolicyGenTemplate
change in Git, and then push the changes to your site configuration repository to deploy bare-metal events monitoring to new sites using GitOps ZTP. Create the Redfish Secret by running the following command:
oc -n openshift-bare-metal-events create secret generic redfish-basic-auth \ --from-literal=username=<bmc_username> --from-literal=password=<bmc_password> \ --from-literal=hostaddr="<bmc_host_ip_addr>"
$ oc -n openshift-bare-metal-events create secret generic redfish-basic-auth \ --from-literal=username=<bmc_username> --from-literal=password=<bmc_password> \ --from-literal=hostaddr="<bmc_host_ip_addr>"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.9.8. Configuring the Image Registry Operator for local caching of images Copy linkLink copied to clipboard!
OpenShift Container Platform manages image caching using a local registry. In edge computing use cases, clusters are often subject to bandwidth restrictions when communicating with centralized image registries, which might result in long image download times.
Long download times are unavoidable during initial deployment. Over time, there is a risk that CRI-O will erase the /var/lib/containers/storage
directory in the case of an unexpected shutdown. To address long image download times, you can create a local image registry on remote managed clusters using GitOps ZTP. This is useful in Edge computing scenarios where clusters are deployed at the far edge of the network.
Before you can set up the local image registry with GitOps ZTP, you need to configure disk partitioning in the SiteConfig
CR that you use to install the remote managed cluster. After installation, you configure the local image registry using a PolicyGenTemplate
CR. Then, the ZTP pipeline creates Persistent Volume (PV) and Persistent Volume Claim (PVC) CRs and patches the imageregistry
configuration.
The local image registry can only be used for user application images and cannot be used for the OpenShift Container Platform or Operator Lifecycle Manager operator images.
19.9.8.1. Configuring disk partitioning with SiteConfig Copy linkLink copied to clipboard!
Configure disk partitioning for a managed cluster using a SiteConfig
CR and GitOps Zero Touch Provisioning (ZTP). The disk partition details in the SiteConfig
CR must match the underlying disk.
You must complete this procedure at installation time.
Prerequisites
- Install Butane.
Procedure
Create the
storage.bu
file by using the following example YAML file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Convert the
storage.bu
file to an Ignition file by running the following command:butane storage.bu
$ butane storage.bu
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"ignition":{"version":"3.2.0"},"storage":{"disks":[{"device":"/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0","partitions":[{"label":"var-lib-containers","sizeMiB":0,"startMiB":250000}],"wipeTable":false}],"filesystems":[{"device":"/dev/disk/by-partlabel/var-lib-containers","format":"xfs","mountOptions":["defaults","prjquota"],"path":"/var/lib/containers","wipeFilesystem":true}]},"systemd":{"units":[{"contents":"# # Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target","enabled":true,"name":"var-lib-containers.mount"}]}}
{"ignition":{"version":"3.2.0"},"storage":{"disks":[{"device":"/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0","partitions":[{"label":"var-lib-containers","sizeMiB":0,"startMiB":250000}],"wipeTable":false}],"filesystems":[{"device":"/dev/disk/by-partlabel/var-lib-containers","format":"xfs","mountOptions":["defaults","prjquota"],"path":"/var/lib/containers","wipeFilesystem":true}]},"systemd":{"units":[{"contents":"# # Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target","enabled":true,"name":"var-lib-containers.mount"}]}}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Use a tool such as JSON Pretty Print to convert the output into JSON format.
Copy the output into the
.spec.clusters.nodes.ignitionConfigOverride
field in theSiteConfig
CR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf the
.spec.clusters.nodes.ignitionConfigOverride
field does not exist, create it.
Verification
During or after installation, verify on the hub cluster that the
BareMetalHost
object shows the annotation by running the following command:oc get bmh -n my-sno-ns my-sno -ojson | jq '.metadata.annotations["bmac.agent-install.openshift.io/ignition-config-overrides"]
$ oc get bmh -n my-sno-ns my-sno -ojson | jq '.metadata.annotations["bmac.agent-install.openshift.io/ignition-config-overrides"]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
"{\"ignition\":{\"version\":\"3.2.0\"},\"storage\":{\"disks\":[{\"device\":\"/dev/disk/by-id/wwn-0x6b07b250ebb9d0002a33509f24af1f62\",\"partitions\":[{\"label\":\"var-lib-containers\",\"sizeMiB\":0,\"startMiB\":250000}],\"wipeTable\":false}],\"filesystems\":[{\"device\":\"/dev/disk/by-partlabel/var-lib-containers\",\"format\":\"xfs\",\"mountOptions\":[\"defaults\",\"prjquota\"],\"path\":\"/var/lib/containers\",\"wipeFilesystem\":true}]},\"systemd\":{\"units\":[{\"contents\":\"# Generated by Butane\\n[Unit]\\nRequires=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\nAfter=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\n\\n[Mount]\\nWhere=/var/lib/containers\\nWhat=/dev/disk/by-partlabel/var-lib-containers\\nType=xfs\\nOptions=defaults,prjquota\\n\\n[Install]\\nRequiredBy=local-fs.target\",\"enabled\":true,\"name\":\"var-lib-containers.mount\"}]}}"
"{\"ignition\":{\"version\":\"3.2.0\"},\"storage\":{\"disks\":[{\"device\":\"/dev/disk/by-id/wwn-0x6b07b250ebb9d0002a33509f24af1f62\",\"partitions\":[{\"label\":\"var-lib-containers\",\"sizeMiB\":0,\"startMiB\":250000}],\"wipeTable\":false}],\"filesystems\":[{\"device\":\"/dev/disk/by-partlabel/var-lib-containers\",\"format\":\"xfs\",\"mountOptions\":[\"defaults\",\"prjquota\"],\"path\":\"/var/lib/containers\",\"wipeFilesystem\":true}]},\"systemd\":{\"units\":[{\"contents\":\"# Generated by Butane\\n[Unit]\\nRequires=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\nAfter=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\n\\n[Mount]\\nWhere=/var/lib/containers\\nWhat=/dev/disk/by-partlabel/var-lib-containers\\nType=xfs\\nOptions=defaults,prjquota\\n\\n[Install]\\nRequiredBy=local-fs.target\",\"enabled\":true,\"name\":\"var-lib-containers.mount\"}]}}"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After installation, check the single-node OpenShift disk status:
Enter into a debug session on the single-node OpenShift node by running the following command.
This step instantiates a debug pod called
<node_name>-debug
:oc debug node/my-sno-node
$ oc debug node/my-sno-node
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set
/host
as the root directory within the debug shell by running the following command.The debug pod mounts the host’s root file system in
/host
within the pod. By changing the root directory to/host
, you can run binaries contained in the host’s executable paths:chroot /host
# chroot /host
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List information about all available block devices by running the following command:
lsblk
# lsblk
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Display information about the file system disk space usage by running the following command:
df -h
# df -h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.9.8.2. Configuring the image registry using PolicyGenTemplate CRs Copy linkLink copied to clipboard!
Use PolicyGenTemplate
(PGT) CRs to apply the CRs required to configure the image registry and patch the imageregistry
configuration.
Prerequisites
- You have configured a disk partition in the managed cluster.
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data for use with GitOps Zero Touch Provisioning (ZTP).
Procedure
Configure the storage class, persistent volume claim, persistent volume, and image registry configuration in the appropriate
PolicyGenTemplate
CR. For example, to configure an individual site, add the following YAML to the fileexample-sno-site.yaml
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Set the appropriate value for
ztp-deploy-wave
depending on whether you are configuring image registries at the site, common, or group level.ztp-deploy-wave: "100"
is suitable for development or testing because it allows you to group the referenced source files together. - 2
- In
ImageRegistryPV.yaml
, ensure that thespec.local.path
field is set to/var/imageregistry
to match the value set for themount_point
field in theSiteConfig
CR.
ImportantDo not set
complianceType: mustonlyhave
for the- fileName: ImageRegistryConfig.yaml
configuration. This can cause the registry pod deployment to fail.-
Commit the
PolicyGenTemplate
change in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.
Verification
Use the following steps to troubleshoot errors with the local image registry on the managed clusters:
Verify successful login to the registry while logged in to the managed cluster. Run the following commands:
Export the managed cluster name:
cluster=<managed_cluster_name>
$ cluster=<managed_cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the managed cluster
kubeconfig
details:oc get secret -n $cluster $cluster-admin-password -o jsonpath='{.data.password}' | base64 -d > kubeadmin-password-$cluster
$ oc get secret -n $cluster $cluster-admin-password -o jsonpath='{.data.password}' | base64 -d > kubeadmin-password-$cluster
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Download and export the cluster
kubeconfig
:oc get secret -n $cluster $cluster-admin-kubeconfig -o jsonpath='{.data.kubeconfig}' | base64 -d > kubeconfig-$cluster && export KUBECONFIG=./kubeconfig-$cluster
$ oc get secret -n $cluster $cluster-admin-kubeconfig -o jsonpath='{.data.kubeconfig}' | base64 -d > kubeconfig-$cluster && export KUBECONFIG=./kubeconfig-$cluster
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Verify access to the image registry from the managed cluster. See "Accessing the registry".
Check that the
Config
CRD in theimageregistry.operator.openshift.io
group instance is not reporting errors. Run the following command while logged in to the managed cluster:oc get image.config.openshift.io cluster -o yaml
$ oc get image.config.openshift.io cluster -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the
PersistentVolumeClaim
on the managed cluster is populated with data. Run the following command while logged in to the managed cluster:oc get pv image-registry-sc
$ oc get pv image-registry-sc
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the
registry*
pod is running and is located under theopenshift-image-registry
namespace.oc get pods -n openshift-image-registry | grep registry*
$ oc get pods -n openshift-image-registry | grep registry*
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
cluster-image-registry-operator-68f5c9c589-42cfg 1/1 Running 0 8d image-registry-5f8987879-6nx6h 1/1 Running 0 8d
cluster-image-registry-operator-68f5c9c589-42cfg 1/1 Running 0 8d image-registry-5f8987879-6nx6h 1/1 Running 0 8d
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check that the disk partition on the managed cluster is correct:
Open a debug shell to the managed cluster:
oc debug node/sno-1.example.com
$ oc debug node/sno-1.example.com
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run
lsblk
to check the host disk partitions:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
/var/imageregistry
indicates that the disk is correctly partitioned.
19.9.9. Using hub templates in PolicyGenTemplate CRs Copy linkLink copied to clipboard!
Topology Aware Lifecycle Manager supports partial Red Hat Advanced Cluster Management (RHACM) hub cluster template functions in configuration policies used with GitOps ZTP.
Hub-side cluster templates allow you to define configuration policies that can be dynamically customized to the target clusters. This reduces the need to create separate policies for many clusters with similiar configurations but with different values.
Policy templates are restricted to the same namespace as the namespace where the policy is defined. This means that you must create the objects referenced in the hub template in the same namespace where the policy is created.
The following supported hub template functions are available for use in GitOps ZTP with TALM:
fromConfigmap
returns the value of the provided data key in the namedConfigMap
resource.NoteThere is a 1 MiB size limit for
ConfigMap
CRs. The effective size forConfigMap
CRs is further limited by thelast-applied-configuration
annotation. To avoid thelast-applied-configuration
limitation, add the following annotation to the templateConfigMap
:argocd.argoproj.io/sync-options: Replace=true
argocd.argoproj.io/sync-options: Replace=true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
base64enc
returns the base64-encoded value of the input string -
base64dec
returns the decoded value of the base64-encoded input string -
indent
returns the input string with added indent spaces -
autoindent
returns the input string with added indent spaces based on the spacing used in the parent template -
toInt
casts and returns the integer value of the input value -
toBool
converts the input string into a boolean value, and returns the boolean
Various Open source community functions are also available for use with GitOps ZTP.
19.9.9.1. Example hub templates Copy linkLink copied to clipboard!
The following code examples are valid hub templates. Each of these templates return values from the ConfigMap
CR with the name test-config
in the default
namespace.
Returns the value with the key
common-key
:{{hub fromConfigMap "default" "test-config" "common-key" hub}}
{{hub fromConfigMap "default" "test-config" "common-key" hub}}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Returns a string by using the concatenated value of the
.ManagedClusterName
field and the string-name
:{{hub fromConfigMap "default" "test-config" (printf "%s-name" .ManagedClusterName) hub}}
{{hub fromConfigMap "default" "test-config" (printf "%s-name" .ManagedClusterName) hub}}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Casts and returns a boolean value from the concatenated value of the
.ManagedClusterName
field and the string-name
:{{hub fromConfigMap "default" "test-config" (printf "%s-name" .ManagedClusterName) | toBool hub}}
{{hub fromConfigMap "default" "test-config" (printf "%s-name" .ManagedClusterName) | toBool hub}}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Casts and returns an integer value from the concatenated value of the
.ManagedClusterName
field and the string-name
:{{hub (printf "%s-name" .ManagedClusterName) | fromConfigMap "default" "test-config" | toInt hub}}
{{hub (printf "%s-name" .ManagedClusterName) | fromConfigMap "default" "test-config" | toInt hub}}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.9.9.2. Specifying host NICs in site PolicyGenTemplate CRs with hub cluster templates Copy linkLink copied to clipboard!
You can manage host NICs in a single ConfigMap
CR and use hub cluster templates to populate the custom NIC values in the generated polices that get applied to the cluster hosts. Using hub cluster templates in site PolicyGenTemplate
(PGT) CRs means that you do not need to create multiple single site PGT CRs for each site.
The following example shows you how to use a single ConfigMap
CR to manage cluster host NICs and apply them to the cluster as polices by using a single PolicyGenTemplate
site CR.
When you use the fromConfigmap
function, the printf
variable is only available for the template resource data
key fields. You cannot use it with name
and namespace
fields.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the GitOps ZTP ArgoCD application.
Procedure
Create a
ConfigMap
resource that describes the NICs for a group of hosts. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
argocd.argoproj.io/sync-options
annotation is required only if theConfigMap
is larger than 1 MiB in size.
NoteThe
ConfigMap
must be in the same namespace with the policy that has the hub template substitution.-
Commit the
ConfigMap
CR in Git, and then push to the Git repository being monitored by the Argo CD application. Create a site PGT CR that uses templates to pull the required data from the
ConfigMap
object. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Commit the site
PolicyGenTemplate
CR in Git and push to the Git repository that is monitored by the ArgoCD application.NoteSubsequent changes to the referenced
ConfigMap
CR are not automatically synced to the applied policies. You need to manually sync the newConfigMap
changes to update existing PolicyGenTemplate CRs. See "Syncing new ConfigMap changes to existing PolicyGenTemplate CRs".
19.9.9.3. Specifying VLAN IDs in group PolicyGenTemplate CRs with hub cluster templates Copy linkLink copied to clipboard!
You can manage VLAN IDs for managed clusters in a single ConfigMap
CR and use hub cluster templates to populate the VLAN IDs in the generated polices that get applied to the clusters.
The following example shows how you how manage VLAN IDs in single ConfigMap
CR and apply them in individual cluster polices by using a single PolicyGenTemplate
group CR.
When using the fromConfigmap
function, the printf
variable is only available for the template resource data
key fields. You cannot use it with name
and namespace
fields.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
Procedure
Create a
ConfigMap
CR that describes the VLAN IDs for a group of cluster hosts. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
argocd.argoproj.io/sync-options
annotation is required only if theConfigMap
is larger than 1 MiB in size.
NoteThe
ConfigMap
must be in the same namespace with the policy that has the hub template substitution.-
Commit the
ConfigMap
CR in Git, and then push to the Git repository being monitored by the Argo CD application. Create a group PGT CR that uses a hub template to pull the required VLAN IDs from the
ConfigMap
object. For example, add the following YAML snippet to the group PGT CR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Commit the group
PolicyGenTemplate
CR in Git, and then push to the Git repository being monitored by the Argo CD application.NoteSubsequent changes to the referenced
ConfigMap
CR are not automatically synced to the applied policies. You need to manually sync the newConfigMap
changes to update existing PolicyGenTemplate CRs. See "Syncing new ConfigMap changes to existing PolicyGenTemplate CRs".
19.9.9.4. Syncing new ConfigMap changes to existing PolicyGenTemplate CRs Copy linkLink copied to clipboard!
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. -
You have created a
PolicyGenTemplate
CR that pulls information from aConfigMap
CR using hub cluster templates.
Procedure
-
Update the contents of your
ConfigMap
CR, and apply the changes in the hub cluster. To sync the contents of the updated
ConfigMap
CR to the deployed policy, do either of the following:Option 1: Delete the existing policy. ArgoCD uses the
PolicyGenTemplate
CR to immediately recreate the deleted policy. For example, run the following command:oc delete policy <policy_name> -n <policy_namespace>
$ oc delete policy <policy_name> -n <policy_namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Option 2: Apply a special annotation
policy.open-cluster-management.io/trigger-update
to the policy with a different value every time when you update theConfigMap
. For example:oc annotate policy <policy_name> -n <policy_namespace> policy.open-cluster-management.io/trigger-update="1"
$ oc annotate policy <policy_name> -n <policy_namespace> policy.open-cluster-management.io/trigger-update="1"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteYou must apply the updated policy for the changes to take effect. For more information, see Special annotation for reprocessing.
Optional: If it exists, delete the
ClusterGroupUpdate
CR that contains the policy. For example:oc delete clustergroupupgrade <cgu_name> -n <cgu_namespace>
$ oc delete clustergroupupgrade <cgu_name> -n <cgu_namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a new
ClusterGroupUpdate
CR that includes the policy to apply with the updatedConfigMap
changes. For example, add the following YAML to the filecgr-example.yaml
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the updated policy:
oc apply -f cgr-example.yaml
$ oc apply -f cgr-example.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.10. Updating managed clusters with the Topology Aware Lifecycle Manager Copy linkLink copied to clipboard!
You can use the Topology Aware Lifecycle Manager (TALM) to manage the software lifecycle of multiple clusters. TALM uses Red Hat Advanced Cluster Management (RHACM) policies to perform changes on the target clusters.
19.10.1. About the Topology Aware Lifecycle Manager configuration Copy linkLink copied to clipboard!
The Topology Aware Lifecycle Manager (TALM) manages the deployment of Red Hat Advanced Cluster Management (RHACM) policies for one or more OpenShift Container Platform clusters. Using TALM in a large network of clusters allows the phased rollout of policies to the clusters in limited batches. This helps to minimize possible service disruptions when updating. With TALM, you can control the following actions:
- The timing of the update
- The number of RHACM-managed clusters
- The subset of managed clusters to apply the policies to
- The update order of the clusters
- The set of policies remediated to the cluster
- The order of policies remediated to the cluster
- The assignment of a canary cluster
For single-node OpenShift, the Topology Aware Lifecycle Manager (TALM) offers the following features:
- Create a backup of a deployment before an upgrade
- Pre-caching images for clusters with limited bandwidth
TALM supports the orchestration of the OpenShift Container Platform y-stream and z-stream updates, and day-two operations on y-streams and z-streams.
19.10.2. About managed policies used with Topology Aware Lifecycle Manager Copy linkLink copied to clipboard!
The Topology Aware Lifecycle Manager (TALM) uses RHACM policies for cluster updates.
TALM can be used to manage the rollout of any policy CR where the remediationAction
field is set to inform
. Supported use cases include the following:
- Manual user creation of policy CRs
-
Automatically generated policies from the
PolicyGenTemplate
custom resource definition (CRD)
For policies that update an Operator subscription with manual approval, TALM provides additional functionality that approves the installation of the updated Operator.
For more information about managed policies, see Policy Overview in the RHACM documentation.
For more information about the PolicyGenTemplate
CRD, see the "About the PolicyGenTemplate CRD" section in "Configuring managed clusters with policies and PolicyGenTemplate resources".
19.10.3. Installing the Topology Aware Lifecycle Manager by using the web console Copy linkLink copied to clipboard!
You can use the OpenShift Container Platform web console to install the Topology Aware Lifecycle Manager.
Prerequisites
- Install the latest version of the RHACM Operator.
- Set up a hub cluster with disconnected regitry.
-
Log in as a user with
cluster-admin
privileges.
Procedure
-
In the OpenShift Container Platform web console, navigate to Operators
OperatorHub. - Search for the Topology Aware Lifecycle Manager from the list of available Operators, and then click Install.
- Keep the default selection of Installation mode ["All namespaces on the cluster (default)"] and Installed Namespace ("openshift-operators") to ensure that the Operator is installed properly.
- Click Install.
Verification
To confirm that the installation is successful:
-
Navigate to the Operators
Installed Operators page. -
Check that the Operator is installed in the
All Namespaces
namespace and its status isSucceeded
.
If the Operator is not installed successfully:
-
Navigate to the Operators
Installed Operators page and inspect the Status
column for any errors or failures. -
Navigate to the Workloads
Pods page and check the logs in any containers in the cluster-group-upgrades-controller-manager
pod that are reporting issues.
19.10.4. Installing the Topology Aware Lifecycle Manager by using the CLI Copy linkLink copied to clipboard!
You can use the OpenShift CLI (oc
) to install the Topology Aware Lifecycle Manager (TALM).
Prerequisites
-
Install the OpenShift CLI (
oc
). - Install the latest version of the RHACM Operator.
- Set up a hub cluster with disconnected registry.
-
Log in as a user with
cluster-admin
privileges.
Procedure
Create a
Subscription
CR:Define the
Subscription
CR and save the YAML file, for example,talm-subscription.yaml
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
Subscription
CR by running the following command:oc create -f talm-subscription.yaml
$ oc create -f talm-subscription.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the installation succeeded by inspecting the CSV resource:
oc get csv -n openshift-operators
$ oc get csv -n openshift-operators
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DISPLAY VERSION REPLACES PHASE topology-aware-lifecycle-manager.4.12.x Topology Aware Lifecycle Manager 4.12.x Succeeded
NAME DISPLAY VERSION REPLACES PHASE topology-aware-lifecycle-manager.4.12.x Topology Aware Lifecycle Manager 4.12.x Succeeded
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the TALM is up and running:
oc get deploy -n openshift-operators
$ oc get deploy -n openshift-operators
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE openshift-operators cluster-group-upgrades-controller-manager 1/1 1 1 14s
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE openshift-operators cluster-group-upgrades-controller-manager 1/1 1 1 14s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.10.5. About the ClusterGroupUpgrade CR Copy linkLink copied to clipboard!
The Topology Aware Lifecycle Manager (TALM) builds the remediation plan from the ClusterGroupUpgrade
CR for a group of clusters. You can define the following specifications in a ClusterGroupUpgrade
CR:
- Clusters in the group
-
Blocking
ClusterGroupUpgrade
CRs - Applicable list of managed policies
- Number of concurrent updates
- Applicable canary updates
- Actions to perform before and after the update
- Update timing
You can control the start time of an update using the enable
field in the ClusterGroupUpgrade
CR. For example, if you have a scheduled maintenance window of four hours, you can prepare a ClusterGroupUpgrade
CR with the enable
field set to false
.
You can set the timeout by configuring the spec.remediationStrategy.timeout
setting as follows:
spec remediationStrategy: maxConcurrency: 1 timeout: 240
spec
remediationStrategy:
maxConcurrency: 1
timeout: 240
You can use the batchTimeoutAction
to determine what happens if an update fails for a cluster. You can specify continue
to skip the failing cluster and continue to upgrade other clusters, or abort
to stop policy remediation for all clusters. Once the timeout elapses, TALM removes all enforce
policies to ensure that no further updates are made to clusters.
To apply the changes, you set the enabled
field to true
.
For more information see the "Applying update policies to managed clusters" section.
As TALM works through remediation of the policies to the specified clusters, the ClusterGroupUpgrade
CR can report true or false statuses for a number of conditions.
After TALM completes a cluster update, the cluster does not update again under the control of the same ClusterGroupUpgrade
CR. You must create a new ClusterGroupUpgrade
CR in the following cases:
- When you need to update the cluster again
-
When the cluster changes to non-compliant with the
inform
policy after being updated
19.10.5.1. Selecting clusters Copy linkLink copied to clipboard!
TALM builds a remediation plan and selects clusters based on the following fields:
-
The
clusterLabelSelector
field specifies the labels of the clusters that you want to update. This consists of a list of the standard label selectors fromk8s.io/apimachinery/pkg/apis/meta/v1
. Each selector in the list uses either label value pairs or label expressions. Matches from each selector are added to the final list of clusters along with the matches from theclusterSelector
field and thecluster
field. -
The
clusters
field specifies a list of clusters to update. -
The
canaries
field specifies the clusters for canary updates. -
The
maxConcurrency
field specifies the number of clusters to update in a batch.
You can use the clusters
, clusterLabelSelector
, and clusterSelector
fields together to create a combined list of clusters.
The remediation plan starts with the clusters listed in the canaries
field. Each canary cluster forms a single-cluster batch.
Sample ClusterGroupUpgrade
CR with the enabled field
set to false
- 1
- Defines the list of clusters to update.
- 2
- The
enable
field is set tofalse
. - 3
- Lists the user-defined set of policies to remediate.
- 4
- Defines the specifics of the cluster updates.
- 5
- Defines the clusters for canary updates.
- 6
- Defines the maximum number of concurrent updates in a batch. The number of remediation batches is the number of canary clusters, plus the number of clusters, except the canary clusters, divided by the maxConcurrency value. The clusters that are already compliant with all the managed policies are excluded from the remediation plan.
- 7
- Displays the parameters for selecting clusters.
- 8
- Controls what happens if a batch times out. Possible values are
abort
orcontinue
. If unspecified, the default iscontinue
. - 9
- Displays information about the status of the updates.
- 10
- The
ClustersSelected
condition shows that all selected clusters are valid. - 11
- The
Validated
condition shows that all selected clusters have been validated.
Any failures during the update of a canary cluster stops the update process.
When the remediation plan is successfully created, you can you set the enable
field to true
and TALM starts to update the non-compliant clusters with the specified managed policies.
You can only make changes to the spec
fields if the enable
field of the ClusterGroupUpgrade
CR is set to false
.
19.10.5.2. Validating Copy linkLink copied to clipboard!
TALM checks that all specified managed policies are available and correct, and uses the Validated
condition to report the status and reasons as follows:
true
Validation is completed.
false
Policies are missing or invalid, or an invalid platform image has been specified.
19.10.5.3. Pre-caching Copy linkLink copied to clipboard!
Clusters might have limited bandwidth to access the container image registry, which can cause a timeout before the updates are completed. On single-node OpenShift clusters, you can use pre-caching to avoid this. The container image pre-caching starts when you create a ClusterGroupUpgrade
CR with the preCaching
field set to true
.
TALM uses the PrecacheSpecValid
condition to report status information as follows:
true
The pre-caching spec is valid and consistent.
false
The pre-caching spec is incomplete.
TALM uses the PrecachingSucceeded
condition to report status information as follows:
true
TALM has concluded the pre-caching process. If pre-caching fails for any cluster, the update fails for that cluster but proceeds for all other clusters. A message informs you if pre-caching has failed for any clusters.
false
Pre-caching is still in progress for one or more clusters or has failed for all clusters.
For more information see the "Using the container image pre-cache feature" section.
19.10.5.4. Creating a backup Copy linkLink copied to clipboard!
For single-node OpenShift, TALM can create a backup of a deployment before an update. If the update fails, you can recover the previous version and restore a cluster to a working state without requiring a reprovision of applications. To use the backup feature you first create a ClusterGroupUpgrade
CR with the backup
field set to true
. To ensure that the contents of the backup are up to date, the backup is not taken until you set the enable
field in the ClusterGroupUpgrade
CR to true
.
TALM uses the BackupSucceeded
condition to report the status and reasons as follows:
true
Backup is completed for all clusters or the backup run has completed but failed for one or more clusters. If backup fails for any cluster, the update fails for that cluster but proceeds for all other clusters.
false
Backup is still in progress for one or more clusters or has failed for all clusters.
For more information, see the "Creating a backup of cluster resources before upgrade" section.
19.10.5.5. Updating clusters Copy linkLink copied to clipboard!
TALM enforces the policies following the remediation plan. Enforcing the policies for subsequent batches starts immediately after all the clusters of the current batch are compliant with all the managed policies. If the batch times out, TALM moves on to the next batch. The timeout value of a batch is the spec.timeout
field divided by the number of batches in the remediation plan.
TALM uses the Progressing
condition to report the status and reasons as follows:
true
TALM is remediating non-compliant policies.
false
The update is not in progress. Possible reasons for this are:
- All clusters are compliant with all the managed policies.
- The update has timed out as policy remediation took too long.
- Blocking CRs are missing from the system or have not yet completed.
-
The
ClusterGroupUpgrade
CR is not enabled. - Backup is still in progress.
The managed policies apply in the order that they are listed in the managedPolicies
field in the ClusterGroupUpgrade
CR. One managed policy is applied to the specified clusters at a time. When a cluster complies with the current policy, the next managed policy is applied to it.
Sample ClusterGroupUpgrade
CR in the Progressing
state
- 1
- The
Progressing
fields show that TALM is in the process of remediating policies.
19.10.5.6. Update status Copy linkLink copied to clipboard!
TALM uses the Succeeded
condition to report the status and reasons as follows:
true
All clusters are compliant with the specified managed policies.
false
Policy remediation failed as there were no clusters available for remediation, or because policy remediation took too long for one of the following reasons:
- The current batch contains canary updates and the cluster in the batch does not comply with all the managed policies within the batch timeout.
-
Clusters did not comply with the managed policies within the
timeout
value specified in theremediationStrategy
field.
Sample ClusterGroupUpgrade
CR in the Succeeded
state
- 2
- In the
Progressing
fields, the status isfalse
as the update has completed; clusters are compliant with all the managed policies. - 3
- The
Succeeded
fields show that the validations completed successfully. - 1
- The
status
field includes a list of clusters and their respective statuses. The status of a cluster can becomplete
ortimedout
.
Sample ClusterGroupUpgrade
CR in the timedout
state
19.10.5.7. Blocking ClusterGroupUpgrade CRs Copy linkLink copied to clipboard!
You can create multiple ClusterGroupUpgrade
CRs and control their order of application.
For example, if you create ClusterGroupUpgrade
CR C that blocks the start of ClusterGroupUpgrade
CR A, then ClusterGroupUpgrade
CR A cannot start until the status of ClusterGroupUpgrade
CR C becomes UpgradeComplete
.
One ClusterGroupUpgrade
CR can have multiple blocking CRs. In this case, all the blocking CRs must complete before the upgrade for the current CR can start.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
-
Log in as a user with
cluster-admin
privileges. - Create RHACM policies in the hub cluster.
Procedure
Save the content of the
ClusterGroupUpgrade
CRs in thecgu-a.yaml
,cgu-b.yaml
, andcgu-c.yaml
files.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Defines the blocking CRs. The
cgu-a
update cannot start untilcgu-c
is complete.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
cgu-b
update cannot start untilcgu-a
is complete.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
cgu-c
update does not have any blocking CRs. TALM starts thecgu-c
update when theenable
field is set totrue
.
Create the
ClusterGroupUpgrade
CRs by running the following command for each relevant CR:oc apply -f <name>.yaml
$ oc apply -f <name>.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start the update process by running the following command for each relevant CR:
oc --namespace=default patch clustergroupupgrade.ran.openshift.io/<name> \ --type merge -p '{"spec":{"enable":true}}'
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/<name> \ --type merge -p '{"spec":{"enable":true}}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following examples show
ClusterGroupUpgrade
CRs where theenable
field is set totrue
:Example for
cgu-a
with blocking CRsCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Shows the list of blocking CRs.
Example for
cgu-b
with blocking CRsCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Shows the list of blocking CRs.
Example for
cgu-c
with blocking CRsCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
cgu-c
update does not have any blocking CRs.
19.10.6. Update policies on managed clusters Copy linkLink copied to clipboard!
The Topology Aware Lifecycle Manager (TALM) remediates a set of inform
policies for the clusters specified in the ClusterGroupUpgrade
CR. TALM remediates inform
policies by making enforce
copies of the managed RHACM policies. Each copied policy has its own corresponding RHACM placement rule and RHACM placement binding.
One by one, TALM adds each cluster from the current batch to the placement rule that corresponds with the applicable managed policy. If a cluster is already compliant with a policy, TALM skips applying that policy on the compliant cluster. TALM then moves on to applying the next policy to the non-compliant cluster. After TALM completes the updates in a batch, all clusters are removed from the placement rules associated with the copied policies. Then, the update of the next batch starts.
If a spoke cluster does not report any compliant state to RHACM, the managed policies on the hub cluster can be missing status information that TALM needs. TALM handles these cases in the following ways:
-
If a policy’s
status.compliant
field is missing, TALM ignores the policy and adds a log entry. Then, TALM continues looking at the policy’sstatus.status
field. -
If a policy’s
status.status
is missing, TALM produces an error. -
If a cluster’s compliance status is missing in the policy’s
status.status
field, TALM considers that cluster to be non-compliant with that policy.
The ClusterGroupUpgrade
CR’s batchTimeoutAction
determines what happens if an upgrade fails for a cluster. You can specify continue
to skip the failing cluster and continue to upgrade other clusters, or specify abort
to stop the policy remediation for all clusters. Once the timeout elapses, TALM removes all enforce policies to ensure that no further updates are made to clusters.
Example upgrade policy
For more information about RHACM policies, see Policy overview.
19.10.6.1. Configuring Operator subscriptions for managed clusters that you install with TALM Copy linkLink copied to clipboard!
Topology Aware Lifecycle Manager (TALM) can only approve the install plan for an Operator if the Subscription
custom resource (CR) of the Operator contains the status.state.AtLatestKnown
field.
Procedure
Add the
status.state.AtLatestKnown
field to theSubscription
CR of the Operator:Example Subscription CR
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
status.state: AtLatestKnown
field is used for the latest Operator version available from the Operator catalog.
NoteWhen a new version of the Operator is available in the registry, the associated policy becomes non-compliant.
-
Apply the changed
Subscription
policy to your managed clusters with aClusterGroupUpgrade
CR.
19.10.6.2. Applying update policies to managed clusters Copy linkLink copied to clipboard!
You can update your managed clusters by applying your policies.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
-
Log in as a user with
cluster-admin
privileges. - Create RHACM policies in the hub cluster.
Procedure
Save the contents of the
ClusterGroupUpgrade
CR in thecgu-1.yaml
file.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The name of the policies to apply.
- 2
- The list of clusters to update.
- 3
- The
maxConcurrency
field signifies the number of clusters updated at the same time. - 4
- The update timeout in minutes.
- 5
- Controls what happens if a batch times out. Possible values are
abort
orcontinue
. If unspecified, the default iscontinue
.
Create the
ClusterGroupUpgrade
CR by running the following command:oc create -f cgu-1.yaml
$ oc create -f cgu-1.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check if the
ClusterGroupUpgrade
CR was created in the hub cluster by running the following command:oc get cgu --all-namespaces
$ oc get cgu --all-namespaces
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAMESPACE NAME AGE STATE DETAILS default cgu-1 8m55 NotEnabled Not Enabled
NAMESPACE NAME AGE STATE DETAILS default cgu-1 8m55 NotEnabled Not Enabled
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the update by running the following command:
oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
spec.enable
field in theClusterGroupUpgrade
CR is set tofalse
.
Check the status of the policies by running the following command:
oc get policies -A
$ oc get policies -A
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
spec.remediationAction
field of policies currently applied on the clusters is set toenforce
. The managed policies ininform
mode from theClusterGroupUpgrade
CR remain ininform
mode during the update.
Change the value of the
spec.enable
field totrue
by running the following command:oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \ --patch '{"spec":{"enable":true}}' --type=merge
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-1 \ --patch '{"spec":{"enable":true}}' --type=merge
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check the status of the update again by running the following command:
oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
$ oc get cgu -n default cgu-1 -ojsonpath='{.status}' | jq
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Reflects the update progress of the current batch. Run this command again to receive updated information about the progress.
If the policies include Operator subscriptions, you can check the installation progress directly on the single-node cluster.
Export the
KUBECONFIG
file of the single-node cluster you want to check the installation progress for by running the following command:export KUBECONFIG=<cluster_kubeconfig_absolute_path>
$ export KUBECONFIG=<cluster_kubeconfig_absolute_path>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check all the subscriptions present on the single-node cluster and look for the one in the policy you are trying to install through the
ClusterGroupUpgrade
CR by running the following command:oc get subs -A | grep -i <subscription_name>
$ oc get subs -A | grep -i <subscription_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output for
cluster-logging
policyNAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-logging cluster-logging cluster-logging redhat-operators stable
NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-logging cluster-logging cluster-logging redhat-operators stable
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
If one of the managed policies includes a
ClusterVersion
CR, check the status of platform updates in the current batch by running the following command against the spoke cluster:oc get clusterversion
$ oc get clusterversion
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.12.5 True True 43s Working towards 4.4.12.7: 71 of 735 done (9% complete)
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.12.5 True True 43s Working towards 4.4.12.7: 71 of 735 done (9% complete)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the Operator subscription by running the following command:
oc get subs -n <operator-namespace> <operator-subscription> -ojsonpath="{.status}"
$ oc get subs -n <operator-namespace> <operator-subscription> -ojsonpath="{.status}"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the install plans present on the single-node cluster that is associated with the desired subscription by running the following command:
oc get installplan -n <subscription_namespace>
$ oc get installplan -n <subscription_namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output for
cluster-logging
OperatorNAMESPACE NAME CSV APPROVAL APPROVED openshift-logging install-6khtw cluster-logging.5.3.3-4 Manual true
NAMESPACE NAME CSV APPROVAL APPROVED openshift-logging install-6khtw cluster-logging.5.3.3-4 Manual true
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The install plans have their
Approval
field set toManual
and theirApproved
field changes fromfalse
totrue
after TALM approves the install plan.
NoteWhen TALM is remediating a policy containing a subscription, it automatically approves any install plans attached to that subscription. Where multiple install plans are needed to get the operator to the latest known version, TALM might approve multiple install plans, upgrading through one or more intermediate versions to get to the final version.
Check if the cluster service version for the Operator of the policy that the
ClusterGroupUpgrade
is installing reached theSucceeded
phase by running the following command:oc get csv -n <operator_namespace>
$ oc get csv -n <operator_namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output for OpenShift Logging Operator
NAME DISPLAY VERSION REPLACES PHASE cluster-logging.5.4.2 Red Hat OpenShift Logging 5.4.2 Succeeded
NAME DISPLAY VERSION REPLACES PHASE cluster-logging.5.4.2 Red Hat OpenShift Logging 5.4.2 Succeeded
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.10.7. Creating a backup of cluster resources before upgrade Copy linkLink copied to clipboard!
For single-node OpenShift, the Topology Aware Lifecycle Manager (TALM) can create a backup of a deployment before an upgrade. If the upgrade fails, you can recover the previous version and restore a cluster to a working state without requiring a reprovision of applications.
To use the backup feature you first create a ClusterGroupUpgrade
CR with the backup
field set to true
. To ensure that the contents of the backup are up to date, the backup is not taken until you set the enable
field in the ClusterGroupUpgrade
CR to true
.
TALM uses the BackupSucceeded
condition to report the status and reasons as follows:
true
Backup is completed for all clusters or the backup run has completed but failed for one or more clusters. If backup fails for any cluster, the update does not proceed for that cluster.
false
Backup is still in progress for one or more clusters or has failed for all clusters. The backup process running in the spoke clusters can have the following statuses:
PreparingToStart
The first reconciliation pass is in progress. The TALM deletes any spoke backup namespace and hub view resources that have been created in a failed upgrade attempt.
Starting
The backup prerequisites and backup job are being created.
Active
The backup is in progress.
Succeeded
The backup succeeded.
BackupTimeout
Artifact backup is partially done.
UnrecoverableError
The backup has ended with a non-zero exit code.
If the backup of a cluster fails and enters the BackupTimeout
or UnrecoverableError
state, the cluster update does not proceed for that cluster. Updates to other clusters are not affected and continue.
19.10.7.1. Creating a ClusterGroupUpgrade CR with backup Copy linkLink copied to clipboard!
You can create a backup of a deployment before an upgrade on single-node OpenShift clusters. If the upgrade fails you can use the upgrade-recovery.sh
script generated by Topology Aware Lifecycle Manager (TALM) to return the system to its preupgrade state. The backup consists of the following items:
- Cluster backup
-
A snapshot of
etcd
and static pod manifests. - Content backup
-
Backups of folders, for example,
/etc
,/usr/local
,/var/lib/kubelet
. - Changed files backup
-
Any files managed by
machine-config
that have been changed. - Deployment
-
A pinned
ostree
deployment. - Images (Optional)
- Any container images that are in use.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
-
Log in as a user with
cluster-admin
privileges. - Install Red Hat Advanced Cluster Management (RHACM).
It is highly recommended that you create a recovery partition. The following is an example SiteConfig
custom resource (CR) for a recovery partition of 50 GB:
Procedure
Save the contents of the
ClusterGroupUpgrade
CR with thebackup
andenable
fields set totrue
in theclustergroupupgrades-group-du.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow To start the update, apply the
ClusterGroupUpgrade
CR by running the following command:oc apply -f clustergroupupgrades-group-du.yaml
$ oc apply -f clustergroupupgrades-group-du.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check the status of the upgrade in the hub cluster by running the following command:
oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'
$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.10.7.2. Recovering a cluster after a failed upgrade Copy linkLink copied to clipboard!
If an upgrade of a cluster fails, you can manually log in to the cluster and use the backup to return the cluster to its preupgrade state. There are two stages:
- Rollback
- If the attempted upgrade included a change to the platform OS deployment, you must roll back to the previous version before running the recovery script.
A rollback is only applicable to upgrades from TALM and single-node OpenShift. This process does not apply to rollbacks from any other upgrade type.
- Recovery
- The recovery shuts down containers and uses files from the backup partition to relaunch containers and restore clusters.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
- Install Red Hat Advanced Cluster Management (RHACM).
-
Log in as a user with
cluster-admin
privileges. - Run an upgrade that is configured for backup.
Procedure
Delete the previously created
ClusterGroupUpgrade
custom resource (CR) by running the following command:oc delete cgu/du-upgrade-4918 -n ztp-group-du-sno
$ oc delete cgu/du-upgrade-4918 -n ztp-group-du-sno
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Log in to the cluster that you want to recover.
Check the status of the platform OS deployment by running the following command:
ostree admin status
$ ostree admin status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example outputs
ostree admin status
[root@lab-test-spoke2-node-0 core]# ostree admin status * rhcos c038a8f08458bbed83a77ece033ad3c55597e3f64edad66ea12fda18cbdceaf9.0 Version: 49.84.202202230006-0 Pinned: yes
1 origin refspec: c038a8f08458bbed83a77ece033ad3c55597e3f64edad66ea12fda18cbdceaf9
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The current deployment is pinned. A platform OS deployment rollback is not necessary.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To trigger a rollback of the platform OS deployment, run the following command:
rpm-ostree rollback -r
$ rpm-ostree rollback -r
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The first phase of the recovery shuts down containers and restores files from the backup partition to the targeted directories. To begin the recovery, run the following command:
/var/recovery/upgrade-recovery.sh
$ /var/recovery/upgrade-recovery.sh
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When prompted, reboot the cluster by running the following command:
systemctl reboot
$ systemctl reboot
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the reboot, restart the recovery by running the following command:
/var/recovery/upgrade-recovery.sh --resume
$ /var/recovery/upgrade-recovery.sh --resume
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
If the recovery utility fails, you can retry with the --restart
option:
/var/recovery/upgrade-recovery.sh --restart
$ /var/recovery/upgrade-recovery.sh --restart
Verification
To check the status of the recovery run the following command:
oc get clusterversion,nodes,clusteroperator
$ oc get clusterversion,nodes,clusteroperator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.10.8. Using the container image pre-cache feature Copy linkLink copied to clipboard!
Single-node OpenShift clusters might have limited bandwidth to access the container image registry, which can cause a timeout before the updates are completed.
The time of the update is not set by TALM. You can apply the ClusterGroupUpgrade
CR at the beginning of the update by manual application or by external automation.
The container image pre-caching starts when the preCaching
field is set to true
in the ClusterGroupUpgrade
CR.
TALM uses the PrecacheSpecValid
condition to report status information as follows:
true
The pre-caching spec is valid and consistent.
false
The pre-caching spec is incomplete.
TALM uses the PrecachingSucceeded
condition to report status information as follows:
true
TALM has concluded the pre-caching process. If pre-caching fails for any cluster, the update fails for that cluster but proceeds for all other clusters. A message informs you if pre-caching has failed for any clusters.
false
Pre-caching is still in progress for one or more clusters or has failed for all clusters.
After a successful pre-caching process, you can start remediating policies. The remediation actions start when the enable
field is set to true
. If there is a pre-caching failure on a cluster, the upgrade fails for that cluster. The upgrade process continues for all other clusters that have a successful pre-cache.
The pre-caching process can be in the following statuses:
NotStarted
This is the initial state all clusters are automatically assigned to on the first reconciliation pass of the
ClusterGroupUpgrade
CR. In this state, TALM deletes any pre-caching namespace and hub view resources of spoke clusters that remain from previous incomplete updates. TALM then creates a newManagedClusterView
resource for the spoke pre-caching namespace to verify its deletion in thePrecachePreparing
state.PreparingToStart
Cleaning up any remaining resources from previous incomplete updates is in progress.
Starting
Pre-caching job prerequisites and the job are created.
Active
The job is in "Active" state.
Succeeded
The pre-cache job succeeded.
PrecacheTimeout
The artifact pre-caching is partially done.
UnrecoverableError
The job ends with a non-zero exit code.
19.10.8.1. Creating a ClusterGroupUpgrade CR with pre-caching Copy linkLink copied to clipboard!
For single-node OpenShift, the pre-cache feature allows the required container images to be present on the spoke cluster before the update starts.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Provision one or more managed clusters.
-
Log in as a user with
cluster-admin
privileges.
Procedure
Save the contents of the
ClusterGroupUpgrade
CR with thepreCaching
field set totrue
in theclustergroupupgrades-group-du.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
preCaching
field is set totrue
, which enables TALM to pull the container images before starting the update.
When you want to start pre-caching, apply the
ClusterGroupUpgrade
CR by running the following command:oc apply -f clustergroupupgrades-group-du.yaml
$ oc apply -f clustergroupupgrades-group-du.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check if the
ClusterGroupUpgrade
CR exists in the hub cluster by running the following command:oc get cgu -A
$ oc get cgu -A
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAMESPACE NAME AGE STATE DETAILS ztp-group-du-sno du-upgrade-4918 10s InProgress Precaching is required and not done
NAMESPACE NAME AGE STATE DETAILS ztp-group-du-sno du-upgrade-4918 10s InProgress Precaching is required and not done
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The CR is created.
Check the status of the pre-caching task by running the following command:
oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'
$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Displays the list of identified clusters.
Check the status of the pre-caching job by running the following command on the spoke cluster:
oc get jobs,pods -n openshift-talo-pre-cache
$ oc get jobs,pods -n openshift-talo-pre-cache
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME COMPLETIONS DURATION AGE job.batch/pre-cache 0/1 3m10s 3m10s NAME READY STATUS RESTARTS AGE pod/pre-cache--1-9bmlr 1/1 Running 0 3m10s
NAME COMPLETIONS DURATION AGE job.batch/pre-cache 0/1 3m10s 3m10s NAME READY STATUS RESTARTS AGE pod/pre-cache--1-9bmlr 1/1 Running 0 3m10s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the
ClusterGroupUpgrade
CR by running the following command:oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'
$ oc get cgu -n ztp-group-du-sno du-upgrade-4918 -o jsonpath='{.status}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The pre-cache tasks are done.
19.10.9. Troubleshooting the Topology Aware Lifecycle Manager Copy linkLink copied to clipboard!
The Topology Aware Lifecycle Manager (TALM) is an OpenShift Container Platform Operator that remediates RHACM policies. When issues occur, use the oc adm must-gather
command to gather details and logs and to take steps in debugging the issues.
For more information about related topics, see the following documentation:
- Red Hat Advanced Cluster Management for Kubernetes 2.4 Support Matrix
- Red Hat Advanced Cluster Management Troubleshooting
- The "Troubleshooting Operator issues" section
19.10.9.1. General troubleshooting Copy linkLink copied to clipboard!
You can determine the cause of the problem by reviewing the following questions:
Is the configuration that you are applying supported?
- Are the RHACM and the OpenShift Container Platform versions compatible?
- Are the TALM and RHACM versions compatible?
Which of the following components is causing the problem?
To ensure that the ClusterGroupUpgrade
configuration is functional, you can do the following:
-
Create the
ClusterGroupUpgrade
CR with thespec.enable
field set tofalse
. - Wait for the status to be updated and go through the troubleshooting questions.
-
If everything looks as expected, set the
spec.enable
field totrue
in theClusterGroupUpgrade
CR.
After you set the spec.enable
field to true
in the ClusterUpgradeGroup
CR, the update procedure starts and you cannot edit the CR’s spec
fields anymore.
19.10.9.2. Cannot modify the ClusterUpgradeGroup CR Copy linkLink copied to clipboard!
- Issue
-
You cannot edit the
ClusterUpgradeGroup
CR after enabling the update. - Resolution
Restart the procedure by performing the following steps:
Remove the old
ClusterGroupUpgrade
CR by running the following command:oc delete cgu -n <ClusterGroupUpgradeCR_namespace> <ClusterGroupUpgradeCR_name>
$ oc delete cgu -n <ClusterGroupUpgradeCR_namespace> <ClusterGroupUpgradeCR_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check and fix the existing issues with the managed clusters and policies.
- Ensure that all the clusters are managed clusters and available.
-
Ensure that all the policies exist and have the
spec.remediationAction
field set toinform
.
Create a new
ClusterGroupUpgrade
CR with the correct configurations.oc apply -f <ClusterGroupUpgradeCR_YAML>
$ oc apply -f <ClusterGroupUpgradeCR_YAML>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.10.9.3. Managed policies Copy linkLink copied to clipboard!
Checking managed policies on the system
- Issue
- You want to check if you have the correct managed policies on the system.
- Resolution
Run the following command:
oc get cgu lab-upgrade -ojsonpath='{.spec.managedPolicies}'
$ oc get cgu lab-upgrade -ojsonpath='{.spec.managedPolicies}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
["group-du-sno-validator-du-validator-policy", "policy2-common-nto-sub-policy", "policy3-common-ptp-sub-policy"]
["group-du-sno-validator-du-validator-policy", "policy2-common-nto-sub-policy", "policy3-common-ptp-sub-policy"]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Checking remediationAction mode
- Issue
-
You want to check if the
remediationAction
field is set toinform
in thespec
of the managed policies. - Resolution
Run the following command:
oc get policies --all-namespaces
$ oc get policies --all-namespaces
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default policy1-common-cluster-version-policy inform NonCompliant 5d21h default policy2-common-nto-sub-policy inform Compliant 5d21h default policy3-common-ptp-sub-policy inform NonCompliant 5d21h default policy4-common-sriov-sub-policy inform NonCompliant 5d21h
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default policy1-common-cluster-version-policy inform NonCompliant 5d21h default policy2-common-nto-sub-policy inform Compliant 5d21h default policy3-common-ptp-sub-policy inform NonCompliant 5d21h default policy4-common-sriov-sub-policy inform NonCompliant 5d21h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Checking policy compliance state
- Issue
- You want to check the compliance state of policies.
- Resolution
Run the following command:
oc get policies --all-namespaces
$ oc get policies --all-namespaces
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default policy1-common-cluster-version-policy inform NonCompliant 5d21h default policy2-common-nto-sub-policy inform Compliant 5d21h default policy3-common-ptp-sub-policy inform NonCompliant 5d21h default policy4-common-sriov-sub-policy inform NonCompliant 5d21h
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default policy1-common-cluster-version-policy inform NonCompliant 5d21h default policy2-common-nto-sub-policy inform Compliant 5d21h default policy3-common-ptp-sub-policy inform NonCompliant 5d21h default policy4-common-sriov-sub-policy inform NonCompliant 5d21h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.10.9.4. Clusters Copy linkLink copied to clipboard!
Checking if managed clusters are present
- Issue
-
You want to check if the clusters in the
ClusterGroupUpgrade
CR are managed clusters. - Resolution
Run the following command:
oc get managedclusters
$ oc get managedclusters
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true https://api.hub.example.com:6443 True Unknown 13d spoke1 true https://api.spoke1.example.com:6443 True True 13d spoke3 true https://api.spoke3.example.com:6443 True True 27h
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true https://api.hub.example.com:6443 True Unknown 13d spoke1 true https://api.spoke1.example.com:6443 True True 13d spoke3 true https://api.spoke3.example.com:6443 True True 27h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, check the TALM manager logs:
Get the name of the TALM manager by running the following command:
oc get pod -n openshift-operators
$ oc get pod -n openshift-operators
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp 2/2 Running 0 45m
NAME READY STATUS RESTARTS AGE cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp 2/2 Running 0 45m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the TALM manager logs by running the following command:
oc logs -n openshift-operators \ cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager
$ oc logs -n openshift-operators \ cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"}
1 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The error message shows that the cluster is not a managed cluster.
Checking if managed clusters are available
- Issue
-
You want to check if the managed clusters specified in the
ClusterGroupUpgrade
CR are available. - Resolution
Run the following command:
oc get managedclusters
$ oc get managedclusters
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true https://api.hub.testlab.com:6443 True Unknown 13d spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE local-cluster true https://api.hub.testlab.com:6443 True Unknown 13d spoke1 true https://api.spoke1.testlab.com:6443 True True 13d
1 spoke3 true https://api.spoke3.testlab.com:6443 True True 27h
2 Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Checking clusterLabelSelector
- Issue
-
You want to check if the
clusterLabelSelector
field specified in theClusterGroupUpgrade
CR matches at least one of the managed clusters. - Resolution
Run the following command:
oc get managedcluster --selector=upgrade=true
$ oc get managedcluster --selector=upgrade=true
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The label for the clusters you want to update is
upgrade:true
.
Example output
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Checking if canary clusters are present
- Issue
You want to check if the canary clusters are present in the list of clusters.
Example
ClusterGroupUpgrade
CRCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Resolution
Run the following commands:
oc get cgu lab-upgrade -ojsonpath='{.spec.clusters}'
$ oc get cgu lab-upgrade -ojsonpath='{.spec.clusters}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
["spoke1", "spoke3"]
["spoke1", "spoke3"]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check if the canary clusters are present in the list of clusters that match
clusterLabelSelector
labels by running the following command:oc get managedcluster --selector=upgrade=true
$ oc get managedcluster --selector=upgrade=true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h
NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE spoke1 true https://api.spoke1.testlab.com:6443 True True 13d spoke3 true https://api.spoke3.testlab.com:6443 True True 27h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
A cluster can be present in spec.clusters
and also be matched by the spec.clusterLabelSelector
label.
Checking the pre-caching status on spoke clusters
Check the status of pre-caching by running the following command on the spoke cluster:
oc get jobs,pods -n openshift-talo-pre-cache
$ oc get jobs,pods -n openshift-talo-pre-cache
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.10.9.5. Remediation Strategy Copy linkLink copied to clipboard!
Checking if remediationStrategy is present in the ClusterGroupUpgrade CR
- Issue
-
You want to check if the
remediationStrategy
is present in theClusterGroupUpgrade
CR. - Resolution
Run the following command:
oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy}'
$ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"maxConcurrency":2, "timeout":240}
{"maxConcurrency":2, "timeout":240}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Checking if maxConcurrency is specified in the ClusterGroupUpgrade CR
- Issue
-
You want to check if the
maxConcurrency
is specified in theClusterGroupUpgrade
CR. - Resolution
Run the following command:
oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy.maxConcurrency}'
$ oc get cgu lab-upgrade -ojsonpath='{.spec.remediationStrategy.maxConcurrency}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
2
2
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.10.9.6. Topology Aware Lifecycle Manager Copy linkLink copied to clipboard!
Checking condition message and status in the ClusterGroupUpgrade CR
- Issue
-
You want to check the value of the
status.conditions
field in theClusterGroupUpgrade
CR. - Resolution
Run the following command:
oc get cgu lab-upgrade -ojsonpath='{.status.conditions}'
$ oc get cgu lab-upgrade -ojsonpath='{.status.conditions}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"lastTransitionTime":"2022-02-17T22:25:28Z", "message":"Missing managed policies:[policyList]", "reason":"NotAllManagedPoliciesExist", "status":"False", "type":"Validated"}
{"lastTransitionTime":"2022-02-17T22:25:28Z", "message":"Missing managed policies:[policyList]", "reason":"NotAllManagedPoliciesExist", "status":"False", "type":"Validated"}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Checking corresponding copied policies
- Issue
-
You want to check if every policy from
status.managedPoliciesForUpgrade
has a corresponding policy instatus.copiedPolicies
. - Resolution
Run the following command:
oc get cgu lab-upgrade -oyaml
$ oc get cgu lab-upgrade -oyaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Checking if status.remediationPlan was computed
- Issue
-
You want to check if
status.remediationPlan
is computed. - Resolution
Run the following command:
oc get cgu lab-upgrade -ojsonpath='{.status.remediationPlan}'
$ oc get cgu lab-upgrade -ojsonpath='{.status.remediationPlan}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
[["spoke2", "spoke3"]]
[["spoke2", "spoke3"]]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Errors in the TALM manager container
- Issue
- You want to check the logs of the manager container of TALM.
- Resolution
Run the following command:
oc logs -n openshift-operators \ cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager
$ oc logs -n openshift-operators \ cluster-group-upgrades-controller-manager-75bcc7484d-8k8xp -c manager
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
ERROR controller-runtime.manager.controller.clustergroupupgrade Reconciler error {"reconciler group": "ran.openshift.io", "reconciler kind": "ClusterGroupUpgrade", "name": "lab-upgrade", "namespace": "default", "error": "Cluster spoke5555 is not a ManagedCluster"}
1 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Displays the error.
Clusters are not compliant to some policies after a ClusterGroupUpgrade
CR has completed
- Issue
The policy compliance status that TALM uses to decide if remediation is needed has not yet fully updated for all clusters. This may be because:
- The CGU was run too soon after a policy was created or updated.
-
The remediation of a policy affects the compliance of subsequent policies in the
ClusterGroupUpgrade
CR.
- Resolution
-
Create a new and apply
ClusterGroupUpdate
CR with the same specification .
19.11. Updating managed clusters in a disconnected environment with the Topology Aware Lifecycle Manager Copy linkLink copied to clipboard!
You can use the Topology Aware Lifecycle Manager (TALM) to manage the software lifecycle of OpenShift Container Platform managed clusters. TALM uses Red Hat Advanced Cluster Management (RHACM) policies to perform changes on the target clusters.
19.11.1. Updating clusters in a disconnected environment Copy linkLink copied to clipboard!
You can upgrade managed clusters and Operators for managed clusters that you have deployed using GitOps ZTP and Topology Aware Lifecycle Manager (TALM).
19.11.1.1. Setting up the environment Copy linkLink copied to clipboard!
TALM can perform both platform and Operator updates.
You must mirror both the platform image and Operator images that you want to update to in your mirror registry before you can use TALM to update your disconnected clusters. Complete the following steps to mirror the images:
For platform updates, you must perform the following steps:
Mirror the desired OpenShift Container Platform image repository. Ensure that the desired platform image is mirrored by following the "Mirroring the OpenShift Container Platform image repository" procedure linked in the Additional resources. Save the contents of the
imageContentSources
section in theimageContentSources.yaml
file:Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Save the image signature of the desired platform image that was mirrored. You must add the image signature to the
PolicyGenTemplate
CR for platform updates. To get the image signature, perform the following steps:Specify the desired OpenShift Container Platform tag by running the following command:
OCP_RELEASE_NUMBER=<release_version>
$ OCP_RELEASE_NUMBER=<release_version>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Specify the architecture of the server by running the following command:
ARCHITECTURE=<server_architecture>
$ ARCHITECTURE=<server_architecture>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the release image digest from Quay by running the following command
DIGEST="$(oc adm release info quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE_NUMBER}-${ARCHITECTURE} | sed -n 's/Pull From: .*@//p')"
$ DIGEST="$(oc adm release info quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE_NUMBER}-${ARCHITECTURE} | sed -n 's/Pull From: .*@//p')"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the digest algorithm by running the following command:
DIGEST_ALGO="${DIGEST%%:*}"
$ DIGEST_ALGO="${DIGEST%%:*}"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the digest signature by running the following command:
DIGEST_ENCODED="${DIGEST#*:}"
$ DIGEST_ENCODED="${DIGEST#*:}"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the image signature from the mirror.openshift.com website by running the following command:
SIGNATURE_BASE64=$(curl -s "https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/${DIGEST_ALGO}=${DIGEST_ENCODED}/signature-1" | base64 -w0 && echo)
$ SIGNATURE_BASE64=$(curl -s "https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/${DIGEST_ALGO}=${DIGEST_ENCODED}/signature-1" | base64 -w0 && echo)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Save the image signature to the
checksum-<OCP_RELEASE_NUMBER>.yaml
file by running the following commands:cat >checksum-${OCP_RELEASE_NUMBER}.yaml <<EOF ${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64} EOF
$ cat >checksum-${OCP_RELEASE_NUMBER}.yaml <<EOF ${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64} EOF
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Prepare the update graph. You have two options to prepare the update graph:
Use the OpenShift Update Service.
For more information about how to set up the graph on the hub cluster, see Deploy the operator for OpenShift Update Service and Build the graph data init container.
Make a local copy of the upstream graph. Host the update graph on an
http
orhttps
server in the disconnected environment that has access to the managed cluster. To download the update graph, use the following command:curl -s https://api.openshift.com/api/upgrades_info/v1/graph?channel=stable-4.12 -o ~/upgrade-graph_stable-4.12
$ curl -s https://api.openshift.com/api/upgrades_info/v1/graph?channel=stable-4.12 -o ~/upgrade-graph_stable-4.12
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
For Operator updates, you must perform the following task:
- Mirror the Operator catalogs. Ensure that the desired operator images are mirrored by following the procedure in the "Mirroring Operator catalogs for use with disconnected clusters" section.
19.11.1.2. Performing a platform update Copy linkLink copied to clipboard!
You can perform a platform update with the TALM.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Update ZTP to the latest version.
- Provision one or more managed clusters with ZTP.
- Mirror the desired image repository.
-
Log in as a user with
cluster-admin
privileges. - Create RHACM policies in the hub cluster.
Procedure
Create a
PolicyGenTemplate
CR for the platform update:Save the following contents of the
PolicyGenTemplate
CR in thedu-upgrade.yaml
file.Example of
PolicyGenTemplate
for platform updateCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
ConfigMap
CR contains the signature of the desired release image to update to. - 2
- Shows the image signature of the desired OpenShift Container Platform release. Get the signature from the
checksum-${OCP_RELEASE_NUMBER}.yaml
file you saved when following the procedures in the "Setting up the environment" section. - 3
- Shows the mirror repository that contains the desired OpenShift Container Platform image. Get the mirrors from the
imageContentSources.yaml
file that you saved when following the procedures in the "Setting up the environment" section. - 4
- Shows the
ClusterVersion
CR to trigger the update. Thechannel
,upstream
, anddesiredVersion
fields are all required for image pre-caching.
The
PolicyGenTemplate
CR generates two policies:-
The
du-upgrade-platform-upgrade-prep
policy does the preparation work for the platform update. It creates theConfigMap
CR for the desired release image signature, creates the image content source of the mirrored release image repository, and updates the cluster version with the desired update channel and the update graph reachable by the managed cluster in the disconnected environment. -
The
du-upgrade-platform-upgrade
policy is used to perform platform upgrade.
Add the
du-upgrade.yaml
file contents to thekustomization.yaml
file located in the ZTP Git repository for thePolicyGenTemplate
CRs and push the changes to the Git repository.ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.
Check the created policies by running the following command:
oc get policies -A | grep platform-upgrade
$ oc get policies -A | grep platform-upgrade
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create the
ClusterGroupUpdate
CR for the platform update with thespec.enable
field set tofalse
.Save the content of the platform update
ClusterGroupUpdate
CR with thedu-upgrade-platform-upgrade-prep
and thedu-upgrade-platform-upgrade
policies and the target clusters to thecgu-platform-upgrade.yml
file, as shown in the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
ClusterGroupUpdate
CR to the hub cluster by running the following command:oc apply -f cgu-platform-upgrade.yml
$ oc apply -f cgu-platform-upgrade.yml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Optional: Pre-cache the images for the platform update.
Enable pre-caching in the
ClusterGroupUpdate
CR by running the following command:oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \ --patch '{"spec":{"preCaching": true}}' --type=merge
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \ --patch '{"spec":{"preCaching": true}}' --type=merge
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the hub cluster:
oc get cgu cgu-platform-upgrade -o jsonpath='{.status.precaching.status}'
$ oc get cgu cgu-platform-upgrade -o jsonpath='{.status.precaching.status}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Start the platform update:
Enable the
cgu-platform-upgrade
policy and disable pre-caching by running the following command:oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
oc get policies --all-namespaces
$ oc get policies --all-namespaces
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.11.1.3. Performing an Operator update Copy linkLink copied to clipboard!
You can perform an Operator update with the TALM.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Update ZTP to the latest version.
- Provision one or more managed clusters with ZTP.
- Mirror the desired index image, bundle images, and all Operator images referenced in the bundle images.
-
Log in as a user with
cluster-admin
privileges. - Create RHACM policies in the hub cluster.
Procedure
Update the
PolicyGenTemplate
CR for the Operator update.Update the
du-upgrade
PolicyGenTemplate
CR with the following additional contents in thedu-upgrade.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The index image URL contains the desired Operator images. If the index images are always pushed to the same image name and tag, this change is not needed.
- 2
- Set how frequently the Operator Lifecycle Manager (OLM) polls the index image for new Operator versions with the
registryPoll.interval
field. This change is not needed if a new index image tag is always pushed for y-stream and z-stream Operator updates. TheregistryPoll.interval
field can be set to a shorter interval to expedite the update, however shorter intervals increase computational load. To counteract this, you can restoreregistryPoll.interval
to the default value once the update is complete.
This update generates one policy,
du-upgrade-operator-catsrc-policy
, to update theredhat-operators-disconnected
catalog source with the new index images that contain the desired Operators images.NoteIf you want to use the image pre-caching for Operators and there are Operators from a different catalog source other than
redhat-operators-disconnected
, you must perform the following tasks:- Prepare a separate catalog source policy with the new index image or registry poll interval update for the different catalog source.
- Prepare a separate subscription policy for the desired Operators that are from the different catalog source.
For example, the desired SRIOV-FEC Operator is available in the
certified-operators
catalog source. To update the catalog source and the Operator subscription, add the following contents to generate two policies,du-upgrade-fec-catsrc-policy
anddu-upgrade-subscriptions-fec-policy
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the specified subscriptions channels in the common
PolicyGenTemplate
CR, if they exist. The default subscriptions channels from the ZTP image are used for the update.NoteThe default channel for the Operators applied through ZTP 4.12 is
stable
, except for theperformance-addon-operator
. As of OpenShift Container Platform 4.11, theperformance-addon-operator
functionality was moved to thenode-tuning-operator
. For the 4.10 release, the default channel for PAO isv4.10
. You can also specify the default channels in the commonPolicyGenTemplate
CR.Push the
PolicyGenTemplate
CRs updates to the ZTP Git repository.ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.
Check the created policies by running the following command:
oc get policies -A | grep -E "catsrc-policy|subscription"
$ oc get policies -A | grep -E "catsrc-policy|subscription"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Apply the required catalog source updates before starting the Operator update.
Save the content of the
ClusterGroupUpgrade
CR namedoperator-upgrade-prep
with the catalog source policies and the target managed clusters to thecgu-operator-upgrade-prep.yml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the policy to the hub cluster by running the following command:
oc apply -f cgu-operator-upgrade-prep.yml
$ oc apply -f cgu-operator-upgrade-prep.yml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Monitor the update process. Upon completion, ensure that the policy is compliant by running the following command:
oc get policies -A | grep -E "catsrc-policy"
$ oc get policies -A | grep -E "catsrc-policy"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create the
ClusterGroupUpgrade
CR for the Operator update with thespec.enable
field set tofalse
.Save the content of the Operator update
ClusterGroupUpgrade
CR with thedu-upgrade-operator-catsrc-policy
policy and the subscription policies created from the commonPolicyGenTemplate
and the target clusters to thecgu-operator-upgrade.yml
file, as shown in the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The policy is needed by the image pre-caching feature to retrieve the operator images from the catalog source.
- 2
- The policy contains Operator subscriptions. If you have followed the structure and content of the reference
PolicyGenTemplates
, all Operator subscriptions are grouped into thecommon-subscriptions-policy
policy.
NoteOne
ClusterGroupUpgrade
CR can only pre-cache the images of the desired Operators defined in the subscription policy from one catalog source included in theClusterGroupUpgrade
CR. If the desired Operators are from different catalog sources, such as in the example of the SRIOV-FEC Operator, anotherClusterGroupUpgrade
CR must be created withdu-upgrade-fec-catsrc-policy
anddu-upgrade-subscriptions-fec-policy
policies for the SRIOV-FEC Operator images pre-caching and update.Apply the
ClusterGroupUpgrade
CR to the hub cluster by running the following command:oc apply -f cgu-operator-upgrade.yml
$ oc apply -f cgu-operator-upgrade.yml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Optional: Pre-cache the images for the Operator update.
Before starting image pre-caching, verify the subscription policy is
NonCompliant
at this point by running the following command:oc get policy common-subscriptions-policy -n <policy_namespace>
$ oc get policy common-subscriptions-policy -n <policy_namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME REMEDIATION ACTION COMPLIANCE STATE AGE common-subscriptions-policy inform NonCompliant 27d
NAME REMEDIATION ACTION COMPLIANCE STATE AGE common-subscriptions-policy inform NonCompliant 27d
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Enable pre-caching in the
ClusterGroupUpgrade
CR by running the following command:oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \ --patch '{"spec":{"preCaching": true}}' --type=merge
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \ --patch '{"spec":{"preCaching": true}}' --type=merge
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Monitor the process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the managed cluster:
oc get cgu cgu-operator-upgrade -o jsonpath='{.status.precaching.status}'
$ oc get cgu cgu-operator-upgrade -o jsonpath='{.status.precaching.status}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check if the pre-caching is completed before starting the update by running the following command:
oc get cgu -n default cgu-operator-upgrade -ojsonpath='{.status.conditions}' | jq
$ oc get cgu -n default cgu-operator-upgrade -ojsonpath='{.status.conditions}' | jq
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Start the Operator update.
Enable the
cgu-operator-upgrade
ClusterGroupUpgrade
CR and disable pre-caching to start the Operator update by running the following command:oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
oc get policies --all-namespaces
$ oc get policies --all-namespaces
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.11.1.3.1. Troubleshooting missed Operator updates due to out-of-date policy compliance states Copy linkLink copied to clipboard!
In some scenarios, Topology Aware Lifecycle Manager (TALM) might miss Operator updates due to an out-of-date policy compliance state.
After a catalog source update, it takes time for the Operator Lifecycle Manager (OLM) to update the subscription status. The status of the subscription policy might continue to show as compliant while TALM decides whether remediation is needed. As a result, the Operator specified in the subscription policy does not get upgraded.
To avoid this scenario, add another catalog source configuration to the PolicyGenTemplate
and specify this configuration in the subscription for any Operators that require an update.
Procedure
Add a catalog source configuration in the
PolicyGenTemplate
resource:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Update the
Subscription
resource to point to the new configuration for Operators that require an update:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Enter the name of the additional catalog source configuration that you defined in the
PolicyGenTemplate
resource.
19.11.1.4. Performing a platform and an Operator update together Copy linkLink copied to clipboard!
You can perform a platform and an Operator update at the same time.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Update ZTP to the latest version.
- Provision one or more managed clusters with ZTP.
-
Log in as a user with
cluster-admin
privileges. - Create RHACM policies in the hub cluster.
Procedure
-
Create the
PolicyGenTemplate
CR for the updates by following the steps described in the "Performing a platform update" and "Performing an Operator update" sections. Apply the prep work for the platform and the Operator update.
Save the content of the
ClusterGroupUpgrade
CR with the policies for platform update preparation work, catalog source updates, and target clusters to thecgu-platform-operator-upgrade-prep.yml
file, for example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
cgu-platform-operator-upgrade-prep.yml
file to the hub cluster by running the following command:oc apply -f cgu-platform-operator-upgrade-prep.yml
$ oc apply -f cgu-platform-operator-upgrade-prep.yml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
oc get policies --all-namespaces
$ oc get policies --all-namespaces
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create the
ClusterGroupUpdate
CR for the platform and the Operator update with thespec.enable
field set tofalse
.Save the contents of the platform and Operator update
ClusterGroupUpdate
CR with the policies and the target clusters to thecgu-platform-operator-upgrade.yml
file, as shown in the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
cgu-platform-operator-upgrade.yml
file to the hub cluster by running the following command:oc apply -f cgu-platform-operator-upgrade.yml
$ oc apply -f cgu-platform-operator-upgrade.yml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Optional: Pre-cache the images for the platform and the Operator update.
Enable pre-caching in the
ClusterGroupUpgrade
CR by running the following command:oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \ --patch '{"spec":{"preCaching": true}}' --type=merge
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \ --patch '{"spec":{"preCaching": true}}' --type=merge
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the managed cluster:
oc get jobs,pods -n openshift-talm-pre-cache
$ oc get jobs,pods -n openshift-talm-pre-cache
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check if the pre-caching is completed before starting the update by running the following command:
oc get cgu cgu-du-upgrade -ojsonpath='{.status.conditions}'
$ oc get cgu cgu-du-upgrade -ojsonpath='{.status.conditions}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Start the platform and Operator update.
Enable the
cgu-du-upgrade
ClusterGroupUpgrade
CR to start the platform and the Operator update by running the following command:oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
oc get policies --all-namespaces
$ oc get policies --all-namespaces
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe CRs for the platform and Operator updates can be created from the beginning by configuring the setting to
spec.enable: true
. In this case, the update starts immediately after pre-caching completes and there is no need to manually enable the CR.Both pre-caching and the update create extra resources, such as policies, placement bindings, placement rules, managed cluster actions, and managed cluster view, to help complete the procedures. Setting the
afterCompletion.deleteObjects
field totrue
deletes all these resources after the updates complete.
19.11.1.5. Removing Performance Addon Operator subscriptions from deployed clusters Copy linkLink copied to clipboard!
In earlier versions of OpenShift Container Platform, the Performance Addon Operator provided automatic, low latency performance tuning for applications. In OpenShift Container Platform 4.11 or later, these functions are part of the Node Tuning Operator.
Do not install the Performance Addon Operator on clusters running OpenShift Container Platform 4.11 or later. If you upgrade to OpenShift Container Platform 4.11 or later, the Node Tuning Operator automatically removes the Performance Addon Operator.
You need to remove any policies that create Performance Addon Operator subscriptions to prevent a re-installation of the Operator.
The reference DU profile includes the Performance Addon Operator in the PolicyGenTemplate
CR common-ranGen.yaml
. To remove the subscription from deployed managed clusters, you must update common-ranGen.yaml
.
If you install Performance Addon Operator 4.10.3-5 or later on OpenShift Container Platform 4.11 or later, the Performance Addon Operator detects the cluster version and automatically hibernates to avoid interfering with the Node Tuning Operator functions. However, to ensure best performance, remove the Performance Addon Operator from your OpenShift Container Platform 4.11 clusters.
Prerequisites
- Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for ArgoCD.
- Update to OpenShift Container Platform 4.11 or later.
-
Log in as a user with
cluster-admin
privileges.
Procedure
Change the
complianceType
tomustnothave
for the Performance Addon Operator namespace, Operator group, and subscription in thecommon-ranGen.yaml
file.Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Merge the changes with your custom site repository and wait for the ArgoCD application to synchronize the change to the hub cluster. The status of the
common-subscriptions-policy
policy changes toNon-Compliant
. - Apply the change to your target clusters by using the Topology Aware Lifecycle Manager. For more information about rolling out configuration changes, see the "Additional resources" section.
Monitor the process. When the status of the
common-subscriptions-policy
policy for a target cluster isCompliant
, the Performance Addon Operator has been removed from the cluster. Get the status of thecommon-subscriptions-policy
by running the following command:oc get policy -n ztp-common common-subscriptions-policy
$ oc get policy -n ztp-common common-subscriptions-policy
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Delete the Performance Addon Operator namespace, Operator group and subscription CRs from
.spec.sourceFiles
in thecommon-ranGen.yaml
file. - Merge the changes with your custom site repository and wait for the ArgoCD application to synchronize the change to the hub cluster. The policy remains compliant.
19.11.2. About the auto-created ClusterGroupUpgrade CR for ZTP Copy linkLink copied to clipboard!
TALM has a controller called ManagedClusterForCGU
that monitors the Ready
state of the ManagedCluster
CRs on the hub cluster and creates the ClusterGroupUpgrade
CRs for ZTP (zero touch provisioning).
For any managed cluster in the Ready
state without a "ztp-done" label applied, the ManagedClusterForCGU
controller automatically creates a ClusterGroupUpgrade
CR in the ztp-install
namespace with its associated RHACM policies that are created during the ZTP process. TALM then remediates the set of configuration policies that are listed in the auto-created ClusterGroupUpgrade
CR to push the configuration CRs to the managed cluster.
If the managed cluster has no bound policies when the cluster becomes Ready
, no ClusterGroupUpgrade
CR is created.
Example of an auto-created ClusterGroupUpgrade
CR for ZTP
19.12. Updating GitOps ZTP Copy linkLink copied to clipboard!
You can update the Gitops zero touch provisioning (ZTP) infrastructure independently from the hub cluster, Red Hat Advanced Cluster Management (RHACM), and the managed OpenShift Container Platform clusters.
You can update the Red Hat OpenShift GitOps Operator when new versions become available. When updating the GitOps ZTP plugin, review the updated files in the reference configuration and ensure that the changes meet your requirements.
19.12.1. Overview of the GitOps ZTP update process Copy linkLink copied to clipboard!
You can update GitOps zero touch provisioning (ZTP) for a fully operational hub cluster running an earlier version of the GitOps ZTP infrastructure. The update process avoids impact on managed clusters.
Any changes to policy settings, including adding recommended content, results in updated polices that must be rolled out to the managed clusters and reconciled.
At a high level, the strategy for updating the GitOps ZTP infrastructure is as follows:
-
Label all existing clusters with the
ztp-done
label. - Stop the ArgoCD applications.
- Install the new GitOps ZTP tools.
- Update required content and optional changes in the Git repository.
- Update and restart the application configuration.
19.12.2. Preparing for the upgrade Copy linkLink copied to clipboard!
Use the following procedure to prepare your site for the GitOps zero touch provisioning (ZTP) upgrade.
Procedure
- Get the latest version of the GitOps ZTP container that has the custom resources (CRs) used to configure Red Hat OpenShift GitOps for use with GitOps ZTP.
Extract the
argocd/deployment
directory by using the following commands:mkdir -p ./update
$ mkdir -p ./update
Copy to Clipboard Copied! Toggle word wrap Toggle overflow podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12 extract /home/ztp --tar | tar x -C ./update
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.12 extract /home/ztp --tar | tar x -C ./update
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
/update
directory contains the following subdirectories:-
update/extra-manifest
: contains the source CR files that theSiteConfig
CR uses to generate the extra manifestconfigMap
. -
update/source-crs
: contains the source CR files that thePolicyGenTemplate
CR uses to generate the Red Hat Advanced Cluster Management (RHACM) policies. -
update/argocd/deployment
: contains patches and YAML files to apply on the hub cluster for use in the next step of this procedure. -
update/argocd/example
: contains exampleSiteConfig
andPolicyGenTemplate
files that represent the recommended configuration.
-
Update the
clusters-app.yaml
andpolicies-app.yaml
files to reflect the name of your applications and the URL, branch, and path for your Git repository.If the upgrade includes changes that results in obsolete policies, the obsolete policies should be removed prior to performing the upgrade.
Diff the changes between the configuration and deployment source CRs in the
/update
folder and Git repo where you manage your fleet site CRs. Apply and push the required changes to your site repository.ImportantWhen you update GitOps ZTP to the latest version, you must apply the changes from the
update/argocd/deployment
directory to your site repository. Do not use older versions of theargocd/deployment/
files.
19.12.3. Labeling the existing clusters Copy linkLink copied to clipboard!
To ensure that existing clusters remain untouched by the tool updates, label all existing managed clusters with the ztp-done
label.
This procedure only applies when updating clusters that were not provisioned with Topology Aware Lifecycle Manager (TALM). Clusters that you provision with TALM are automatically labeled with ztp-done
.
Procedure
Find a label selector that lists the managed clusters that were deployed with zero touch provisioning (ZTP), such as
local-cluster!=true
:oc get managedcluster -l 'local-cluster!=true'
$ oc get managedcluster -l 'local-cluster!=true'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure that the resulting list contains all the managed clusters that were deployed with ZTP, and then use that selector to add the
ztp-done
label:oc label managedcluster -l 'local-cluster!=true' ztp-done=
$ oc label managedcluster -l 'local-cluster!=true' ztp-done=
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.12.4. Stopping the existing GitOps ZTP applications Copy linkLink copied to clipboard!
Removing the existing applications ensures that any changes to existing content in the Git repository are not rolled out until the new version of the tools is available.
Use the application files from the deployment
directory. If you used custom names for the applications, update the names in these files first.
Procedure
Perform a non-cascaded delete on the
clusters
application to leave all generated resources in place:oc delete -f update/argocd/deployment/clusters-app.yaml
$ oc delete -f update/argocd/deployment/clusters-app.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Perform a cascaded delete on the
policies
application to remove all previous policies:oc patch -f policies-app.yaml -p '{"metadata": {"finalizers": ["resources-finalizer.argocd.argoproj.io"]}}' --type merge
$ oc patch -f policies-app.yaml -p '{"metadata": {"finalizers": ["resources-finalizer.argocd.argoproj.io"]}}' --type merge
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete -f update/argocd/deployment/policies-app.yaml
$ oc delete -f update/argocd/deployment/policies-app.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.12.5. Required changes to the Git repository Copy linkLink copied to clipboard!
When upgrading the ztp-site-generate
container from an earlier release of GitOps ZTP to v4.10 or later, there are additional requirements for the contents of the Git repository. Existing content in the repository must be updated to reflect these changes.
Make required changes to
PolicyGenTemplate
files:All
PolicyGenTemplate
files must be created in aNamespace
prefixed withztp
. This ensures that the GitOps zero touch provisioning (ZTP) application is able to manage the policy CRs generated by GitOps ZTP without conflicting with the way Red Hat Advanced Cluster Management (RHACM) manages the policies internally.Add the
kustomization.yaml
file to the repository:All
SiteConfig
andPolicyGenTemplate
CRs must be included in akustomization.yaml
file under their respective directory trees. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe files listed in the
generator
sections must contain eitherSiteConfig
orPolicyGenTemplate
CRs only. If your existing YAML files contain other CRs, for example,Namespace
, these other CRs must be pulled out into separate files and listed in theresources
section.The
PolicyGenTemplate
kustomization file must contain allPolicyGenTemplate
YAML files in thegenerator
section andNamespace
CRs in theresources
section. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
SiteConfig
kustomization file must contain allSiteConfig
YAML files in thegenerator
section and any other CRs in the resources:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the
pre-sync.yaml
andpost-sync.yaml
files.In OpenShift Container Platform 4.10 and later, the
pre-sync.yaml
andpost-sync.yaml
files are no longer required. Theupdate/deployment/kustomization.yaml
CR manages the policies deployment on the hub cluster.NoteThere is a set of
pre-sync.yaml
andpost-sync.yaml
files under both theSiteConfig
andPolicyGenTemplate
trees.Review and incorporate recommended changes
Each release may include additional recommended changes to the configuration applied to deployed clusters. Typically these changes result in lower CPU use by the OpenShift platform, additional features, or improved tuning of the platform.
Review the reference
SiteConfig
andPolicyGenTemplate
CRs applicable to the types of cluster in your network. These examples can be found in theargocd/example
directory extracted from the GitOps ZTP container.
19.12.6. Installing the new GitOps ZTP applications Copy linkLink copied to clipboard!
Using the extracted argocd/deployment
directory, and after ensuring that the applications point to your site Git repository, apply the full contents of the deployment directory. Applying the full contents of the directory ensures that all necessary resources for the applications are correctly configured.
Procedure
To patch the ArgoCD instance in the hub cluster by using the patch file that you previously extracted into the
update/argocd/deployment/
directory, enter the following command:oc patch argocd openshift-gitops \ -n openshift-gitops --type=merge \ --patch-file update/argocd/deployment/argocd-openshift-gitops-patch.json
$ oc patch argocd openshift-gitops \ -n openshift-gitops --type=merge \ --patch-file update/argocd/deployment/argocd-openshift-gitops-patch.json
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To apply the contents of the
argocd/deployment
directory, enter the following command:oc apply -k update/argocd/deployment
$ oc apply -k update/argocd/deployment
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.12.7. Rolling out the GitOps ZTP configuration changes Copy linkLink copied to clipboard!
If any configuration changes were included in the upgrade due to implementing recommended changes, the upgrade process results in a set of policy CRs on the hub cluster in the Non-Compliant
state. With the ZTP GitOps v4.10 and later ztp-site-generate
container, these policies are set to inform
mode and are not pushed to the managed clusters without an additional step by the user. This ensures that potentially disruptive changes to the clusters can be managed in terms of when the changes are made, for example, during a maintenance window, and how many clusters are updated concurrently.
To roll out the changes, create one or more ClusterGroupUpgrade
CRs as detailed in the TALM documentation. The CR must contain the list of Non-Compliant
policies that you want to push out to the managed clusters as well as a list or selector of which clusters should be included in the update.
19.13. Expanding single-node OpenShift clusters with GitOps ZTP Copy linkLink copied to clipboard!
You can expand single-node OpenShift clusters with GitOps ZTP. When you add worker nodes to single-node OpenShift clusters, the original single-node OpenShift cluster retains the control plane node role. Adding worker nodes does not require any downtime for the existing single-node OpenShift cluster.
Although there is no specified limit on the number of worker nodes that you can add to a single-node OpenShift cluster, you must revaluate the reserved CPU allocation on the control plane node for the additional worker nodes.
If you require workload partitioning on the worker node, you must deploy and remediate the managed cluster policies on the hub cluster before installing the node. This way, the workload partitioning MachineConfig
objects are rendered and associated with the worker
machine config pool before the GitOps ZTP workflow applies the MachineConfig
ignition file to the worker node.
It is recommended that you first remediate the policies, and then install the worker node. If you create the workload partitioning manifests after installing the worker node, you must drain the node manually and delete all the pods managed by daemon sets. When the managing daemon sets create the new pods, the new pods undergo the workload partitioning process.
Adding worker nodes to single-node OpenShift clusters with GitOps ZTP is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
19.13.1. Applying profiles to the worker node Copy linkLink copied to clipboard!
You can configure the additional worker node with a DU profile.
You can apply a RAN distributed unit (DU) profile to the worker node cluster using the ZTP GitOps common, group, and site-specific PolicyGenTemplate
resources. The GitOps ZTP pipeline that is linked to the ArgoCD policies
application includes the following CRs that you can find in the out/argocd/example/policygentemplates
folder when you extract the ztp-site-generate
container:
-
common-ranGen.yaml
-
group-du-sno-ranGen.yaml
-
example-sno-site.yaml
-
ns.yaml
-
kustomization.yaml
Configuring the DU profile on the worker node is considered an upgrade. To initiate the upgrade flow, you must update the existing policies or create additional ones. Then, you must create a ClusterGroupUpgrade
CR to reconcile the policies in the group of clusters.
19.13.2. (Optional) Ensuring PTP and SR-IOV daemon selector compatibility Copy linkLink copied to clipboard!
If the DU profile was deployed using the GitOps ZTP plugin version 4.11 or earlier, the PTP and SR-IOV Operators might be configured to place the daemons only on nodes labelled as master
. This configuration prevents the PTP and SR-IOV daemons from operating on the worker node. If the PTP and SR-IOV daemon node selectors are incorrectly configured on your system, you must change the daemons before proceeding with the worker DU profile configuration.
Procedure
Check the daemon node selector settings of the PTP Operator on one of the spoke clusters:
oc get ptpoperatorconfig/default -n openshift-ptp -ojsonpath='{.spec}' | jq
$ oc get ptpoperatorconfig/default -n openshift-ptp -ojsonpath='{.spec}' | jq
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output for PTP Operator
{"daemonNodeSelector":{"node-role.kubernetes.io/master":""}}
{"daemonNodeSelector":{"node-role.kubernetes.io/master":""}}
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- If the node selector is set to
master
, the spoke was deployed with the version of the ZTP plugin that requires changes.
Check the daemon node selector settings of the SR-IOV Operator on one of the spoke clusters:
oc get sriovoperatorconfig/default -n \ openshift-sriov-network-operator -ojsonpath='{.spec}' | jq
$ oc get sriovoperatorconfig/default -n \ openshift-sriov-network-operator -ojsonpath='{.spec}' | jq
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output for SR-IOV Operator
{"configDaemonNodeSelector":{"node-role.kubernetes.io/worker":""},"disableDrain":false,"enableInjector":true,"enableOperatorWebhook":true}
{"configDaemonNodeSelector":{"node-role.kubernetes.io/worker":""},"disableDrain":false,"enableInjector":true,"enableOperatorWebhook":true}
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- If the node selector is set to
master
, the spoke was deployed with the version of the ZTP plugin that requires changes.
In the group policy, add the following
complianceType
andspec
entries:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantChanging the
daemonNodeSelector
field causes temporary PTP synchronization loss and SR-IOV connectivity loss.- Commit the changes in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.
19.13.3. PTP and SR-IOV node selector compatibility Copy linkLink copied to clipboard!
The PTP configuration resources and SR-IOV network node policies use node-role.kubernetes.io/master: ""
as the node selector. If the additional worker nodes have the same NIC configuration as the control plane node, the policies used to configure the control plane node can be reused for the worker nodes. However, the node selector must be changed to select both node types, for example with the "node-role.kubernetes.io/worker"
label.
19.13.4. Using PolicyGenTemplate CRs to apply worker node policies to worker nodes Copy linkLink copied to clipboard!
You can create policies for worker nodes.
Procedure
Create the following policy template:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The policies are applied to all clusters with this label.
- 2
- The
MCP
field must be set toworker
. - 3
- This generic
MachineConfig
CR is used to configure workload partitioning on the worker node. - 4
- The
cpu.isolated
andcpu.reserved
fields must be configured for each particular hardware platform. - 5
- The
cmdline_crash
CPU set must match thecpu.isolated
set in thePerformanceProfile
section.
A generic
MachineConfig
CR is used to configure workload partitioning on the worker node. You can generate the content ofcrio
andkubelet
configuration files.-
Add the created policy template to the Git repository monitored by the ArgoCD
policies
application. -
Add the policy in the
kustomization.yaml
file. - Commit the changes in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.
To remediate the new policies to your spoke cluster, create a TALM custom resource:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.13.5. Adding worker nodes to single-node OpenShift clusters with GitOps ZTP Copy linkLink copied to clipboard!
You can add one or more worker nodes to existing single-node OpenShift clusters to increase available CPU resources in the cluster.
Prerequisites
- Install and configure RHACM 2.6 or later in an OpenShift Container Platform 4.11 or later bare-metal hub cluster
- Install Topology Aware Lifecycle Manager in the hub cluster
- Install Red Hat OpenShift GitOps in the hub cluster
-
Use the GitOps ZTP
ztp-site-generate
container image version 4.12 or later - Deploy a managed single-node OpenShift cluster with GitOps ZTP
- Configure the Central Infrastructure Management as described in the RHACM documentation
-
Configure the DNS serving the cluster to resolve the internal API endpoint
api-int.<cluster_name>.<base_domain>
Procedure
If you deployed your cluster by using the
example-sno.yaml
SiteConfig
manifest, add your new worker node to thespec.clusters['example-sno'].nodes
list:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a BMC authentication secret for the new host, as referenced by the
bmcCredentialsName
field in thespec.nodes
section of yourSiteConfig
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Commit the changes in Git, and then push to the Git repository that is being monitored by the GitOps ZTP ArgoCD application.
When the ArgoCD
cluster
application synchronizes, two new manifests appear on the hub cluster generated by the ZTP plugin:-
BareMetalHost
NMStateConfig
ImportantThe
cpuset
field should not be configured for the worker node. Workload partitioning for worker nodes is added through management policies after the node installation is complete.
-
Verification
You can monitor the installation process in several ways.
Check if the preprovisioning images are created by running the following command:
oc get ppimg -n example-sno
$ oc get ppimg -n example-sno
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAMESPACE NAME READY REASON example-sno example-sno True ImageCreated example-sno example-node2 True ImageCreated
NAMESPACE NAME READY REASON example-sno example-sno True ImageCreated example-sno example-node2 True ImageCreated
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the state of the bare-metal hosts:
oc get bmh -n example-sno
$ oc get bmh -n example-sno
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME STATE CONSUMER ONLINE ERROR AGE example-sno provisioned true 69m example-node2 provisioning true 4m50s
NAME STATE CONSUMER ONLINE ERROR AGE example-sno provisioned true 69m example-node2 provisioning true 4m50s
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The
provisioning
state indicates that node booting from the installation media is in progress.
Continuously monitor the installation process:
Watch the agent install process by running the following command:
oc get agent -n example-sno --watch
$ oc get agent -n example-sno --watch
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When the worker node installation is finished, the worker node certificates are approved automatically. At this point, the worker appears in the
ManagedClusterInfo
status. Run the following command to see the status:oc get managedclusterinfo/example-sno -n example-sno -o \ jsonpath='{range .status.nodeList[*]}{.name}{"\t"}{.conditions}{"\t"}{.labels}{"\n"}{end}'
$ oc get managedclusterinfo/example-sno -n example-sno -o \ jsonpath='{range .status.nodeList[*]}{.name}{"\t"}{.conditions}{"\t"}{.labels}{"\n"}{end}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
example-sno [{"status":"True","type":"Ready"}] {"node-role.kubernetes.io/master":"","node-role.kubernetes.io/worker":""} example-node2 [{"status":"True","type":"Ready"}] {"node-role.kubernetes.io/worker":""}
example-sno [{"status":"True","type":"Ready"}] {"node-role.kubernetes.io/master":"","node-role.kubernetes.io/worker":""} example-node2 [{"status":"True","type":"Ready"}] {"node-role.kubernetes.io/worker":""}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.14. Pre-caching images for single-node OpenShift deployments Copy linkLink copied to clipboard!
In environments with limited bandwidth where you use the GitOps zero touch provisioning (ZTP) solution to deploy a large number of clusters, you want to avoid downloading all the images that are required for bootstrapping and installing OpenShift Container Platform. The limited bandwidth at remote single-node OpenShift sites can cause long deployment times. The factory-precaching-cli tool allows you to pre-stage servers before shipping them to the remote site for ZTP provisioning.
The factory-precaching-cli tool does the following:
- Downloads the RHCOS rootfs image that is required by the minimal ISO to boot.
-
Creates a partition from the installation disk labelled as
data
. - Formats the disk in xfs.
- Creates a GUID Partition Table (GPT) data partition at the end of the disk, where the size of the partition is configurable by the tool.
- Copies the container images required to install OpenShift Container Platform.
- Copies the container images required by ZTP to install OpenShift Container Platform.
- Optional: Copies Day-2 Operators to the partition.
The factory-precaching-cli tool is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
19.14.1. Getting the factory-precaching-cli tool Copy linkLink copied to clipboard!
The factory-precaching-cli tool Go binary is publicly available in the Telco RAN tools container image. The factory-precaching-cli tool Go binary in the container image is executed on the server running an RHCOS live image using podman
. If you are working in a disconnected environment or have a private registry, you need to copy the image there so you can download the image to the server.
Procedure
Pull the factory-precaching-cli tool image by running the following command:
podman pull quay.io/openshift-kni/telco-ran-tools:latest
# podman pull quay.io/openshift-kni/telco-ran-tools:latest
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
To check that the tool is available, query the current version of the factory-precaching-cli tool Go binary:
podman run quay.io/openshift-kni/telco-ran-tools:latest -- factory-precaching-cli -v
# podman run quay.io/openshift-kni/telco-ran-tools:latest -- factory-precaching-cli -v
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
factory-precaching-cli version 20221018.120852+main.feecf17
factory-precaching-cli version 20221018.120852+main.feecf17
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.14.2. Booting from a live operating system image Copy linkLink copied to clipboard!
You can use the factory-precaching-cli tool with to boot servers where only one disk is available and external disk drive cannot be attached to the server.
RHCOS requires the disk to not be in use when the disk is about to be written with an RHCOS image.
Depending on the server hardware, you can mount the RHCOS live ISO on the blank server using one of the following methods:
- Using the Dell RACADM tool on a Dell server.
- Using the HPONCFG tool on a HP server.
- Using the Redfish BMC API.
It is recommended to automate the mounting procedure. To automate the procedure, you need to pull the required images and host them on a local HTTP server.
Prerequisites
- You powered up the host.
- You have network connectivity to the host.
This example procedure uses the Redfish BMC API to mount the RHCOS live ISO.
Mount the RHCOS live ISO:
Check virtual media status:
curl --globoff -H "Content-Type: application/json" -H \ "Accept: application/json" -k -X GET --user ${username_password} \ https://$BMC_ADDRESS/redfish/v1/Managers/Self/VirtualMedia/1 | python -m json.tool
$ curl --globoff -H "Content-Type: application/json" -H \ "Accept: application/json" -k -X GET --user ${username_password} \ https://$BMC_ADDRESS/redfish/v1/Managers/Self/VirtualMedia/1 | python -m json.tool
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Mount the ISO file as a virtual media:
curl --globoff -L -w "%{http_code} %{url_effective}\\n" -ku ${username_password} -H "Content-Type: application/json" -H "Accept: application/json" -d '{"Image": "http://[$HTTPd_IP]/RHCOS-live.iso"}' -X POST https://$BMC_ADDRESS/redfish/v1/Managers/Self/VirtualMedia/1/Actions/VirtualMedia.InsertMedia
$ curl --globoff -L -w "%{http_code} %{url_effective}\\n" -ku ${username_password} -H "Content-Type: application/json" -H "Accept: application/json" -d '{"Image": "http://[$HTTPd_IP]/RHCOS-live.iso"}' -X POST https://$BMC_ADDRESS/redfish/v1/Managers/Self/VirtualMedia/1/Actions/VirtualMedia.InsertMedia
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the boot order to boot from the virtual media once:
curl --globoff -L -w "%{http_code} %{url_effective}\\n" -ku ${username_password} -H "Content-Type: application/json" -H "Accept: application/json" -d '{"Boot":{ "BootSourceOverrideEnabled": "Once", "BootSourceOverrideTarget": "Cd", "BootSourceOverrideMode": "UEFI"}}' -X PATCH https://$BMC_ADDRESS/redfish/v1/Systems/Self
$ curl --globoff -L -w "%{http_code} %{url_effective}\\n" -ku ${username_password} -H "Content-Type: application/json" -H "Accept: application/json" -d '{"Boot":{ "BootSourceOverrideEnabled": "Once", "BootSourceOverrideTarget": "Cd", "BootSourceOverrideMode": "UEFI"}}' -X PATCH https://$BMC_ADDRESS/redfish/v1/Systems/Self
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Reboot and ensure that the server is booting from virtual media.
19.14.3. Partitioning the disk Copy linkLink copied to clipboard!
To run the full pre-caching process, you have to boot from a live ISO and use the factory-precaching-cli tool from a container image to partition and pre-cache all the artifacts required.
A live ISO or RHCOS live ISO is required because the disk must not be in use when the operating system (RHCOS) is written to the device during the provisioning. Single-disk servers can also be enabled with this procedure.
Prerequisites
- You have a disk that is not partitioned.
-
You have access to the
quay.io/openshift-kni/telco-ran-tools:latest
image. - You have enough storage to install OpenShift Container Platform and pre-cache the required images.
Procedure
Verify that the disk is cleared:
lsblk
# lsblk
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 93.8G 0 loop /run/ephemeral loop1 7:1 0 897.3M 1 loop /sysroot sr0 11:0 1 999M 0 rom /run/media/iso nvme0n1 259:1 0 1.5T 0 disk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 93.8G 0 loop /run/ephemeral loop1 7:1 0 897.3M 1 loop /sysroot sr0 11:0 1 999M 0 rom /run/media/iso nvme0n1 259:1 0 1.5T 0 disk
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Erase any file system, RAID or partition table signatures from the device:
wipefs -a /dev/nvme0n1
# wipefs -a /dev/nvme0n1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
/dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 /dev/nvme0n1: 8 bytes were erased at offset 0x1749a955e00 (gpt): 45 46 49 20 50 41 52 54 /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
/dev/nvme0n1: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54 /dev/nvme0n1: 8 bytes were erased at offset 0x1749a955e00 (gpt): 45 46 49 20 50 41 52 54 /dev/nvme0n1: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
The tool fails if the disk is not empty because it uses partition number 1 of the device for pre-caching the artifacts.
19.14.3.1. Creating the partition Copy linkLink copied to clipboard!
Once the device is ready, you create a single partition and a GPT partition table. The partition is automatically labelled as data
and created at the end of the device. Otherwise, the partition will be overridden by the coreos-installer
.
The coreos-installer
requires the partition to be created at the end of the device and to be labelled as data
. Both requirements are necessary to save the partition when writing the RHCOS image to the disk.
Prerequisites
-
The container must run as
privileged
due to formatting host devices. -
You have to mount the
/dev
folder so that the process can be executed inside the container.
Procedure
In the following example, the size of the partition is 250 GiB due to allow pre-caching the DU profile for Day 2 Operators.
Run the container as
privileged
and partition the disk:podman run -v /dev:/dev --privileged \ --rm quay.io/openshift-kni/telco-ran-tools:latest -- \ factory-precaching-cli partition \ -d /dev/nvme0n1 \ -s 250
# podman run -v /dev:/dev --privileged \ --rm quay.io/openshift-kni/telco-ran-tools:latest -- \ factory-precaching-cli partition \
1 -d /dev/nvme0n1 \
2 -s 250
3 Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the storage information:
lsblk
# lsblk
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
You must verify that the following requirements are met:
- The device has a GPT partition table
- The partition uses the latest sectors of the device.
-
The partition is correctly labeled as
data
.
Query the disk status to verify that the disk is partitioned as expected:
gdisk -l /dev/nvme0n1
# gdisk -l /dev/nvme0n1
Example output
19.14.3.2. Mounting the partition Copy linkLink copied to clipboard!
After verifying that the disk is partitioned correctly, you can mount the device into /mnt
.
It is recommended to mount the device into /mnt
because that mounting point is used during ZTP preparation.
Verify that the partition is formatted as
xfs
:lsblk -f /dev/nvme0n1
# lsblk -f /dev/nvme0n1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME FSTYPE LABEL UUID MOUNTPOINT nvme0n1 └─nvme0n1p1 xfs 1bee8ea4-d6cf-4339-b690-a76594794071
NAME FSTYPE LABEL UUID MOUNTPOINT nvme0n1 └─nvme0n1p1 xfs 1bee8ea4-d6cf-4339-b690-a76594794071
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Mount the partition:
mount /dev/nvme0n1p1 /mnt/
# mount /dev/nvme0n1p1 /mnt/
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check that the partition is mounted:
lsblk
# lsblk
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The mount point is
/var/mnt
because the/mnt
folder in RHCOS is a link to/var/mnt
.
19.14.4. Downloading the images Copy linkLink copied to clipboard!
The factory-precaching-cli tool allows you to download the following images to your partitioned server:
- OpenShift Container Platform images
- Operator images that are included in the distributed unit (DU) profile for 5G RAN sites
- Operator images from disconnected registries
The list of available Operator images can vary in different OpenShift Container Platform releases.
19.14.4.1. Downloading with parallel workers Copy linkLink copied to clipboard!
The factory-precaching-cli tool uses parallel workers to download multiple images simultaneously. You can configure the number of workers with the --parallel
or -p
option. The default number is set to 80% of the available CPUs to the server.
Your login shell may be restricted to a subset of CPUs, which reduces the CPUs available to the container. To remove this restriction, you can precede your commands with taskset 0xffffffff
, for example:
taskset 0xffffffff podman run --rm quay.io/openshift-kni/telco-ran-tools:latest factory-precaching-cli download --help
# taskset 0xffffffff podman run --rm quay.io/openshift-kni/telco-ran-tools:latest factory-precaching-cli download --help
19.14.4.2. Preparing to download the OpenShift Container Platform images Copy linkLink copied to clipboard!
To download OpenShift Container Platform container images, you need to know the multicluster engine version. When you use the --du-profile
flag, you also need to specify the Red Hat Advanced Cluster Management (RHACM) version running in the hub cluster that is going to provision the single-node OpenShift.
Prerequisites
- You have RHACM and the multicluster engine Operator installed.
- You partitioned the storage device.
- You have enough space for the images on the partitioned device.
- You connected the bare-metal server to the Internet.
- You have a valid pull secret.
Procedure
Check the RHACM version and the multicluster engine version by running the following commands in the hub cluster:
oc get csv -A | grep -i advanced-cluster-management
$ oc get csv -A | grep -i advanced-cluster-management
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
open-cluster-management advanced-cluster-management.v2.6.3 Advanced Cluster Management for Kubernetes 2.6.3 advanced-cluster-management.v2.6.3 Succeeded
open-cluster-management advanced-cluster-management.v2.6.3 Advanced Cluster Management for Kubernetes 2.6.3 advanced-cluster-management.v2.6.3 Succeeded
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get csv -A | grep -i multicluster-engine
$ oc get csv -A | grep -i multicluster-engine
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
multicluster-engine cluster-group-upgrades-operator.v0.0.3 cluster-group-upgrades-operator 0.0.3 Pending multicluster-engine multicluster-engine.v2.1.4 multicluster engine for Kubernetes 2.1.4 multicluster-engine.v2.0.3 Succeeded multicluster-engine openshift-gitops-operator.v1.5.7 Red Hat OpenShift GitOps 1.5.7 openshift-gitops-operator.v1.5.6-0.1664915551.p Succeeded multicluster-engine openshift-pipelines-operator-rh.v1.6.4 Red Hat OpenShift Pipelines 1.6.4 openshift-pipelines-operator-rh.v1.6.3 Succeeded
multicluster-engine cluster-group-upgrades-operator.v0.0.3 cluster-group-upgrades-operator 0.0.3 Pending multicluster-engine multicluster-engine.v2.1.4 multicluster engine for Kubernetes 2.1.4 multicluster-engine.v2.0.3 Succeeded multicluster-engine openshift-gitops-operator.v1.5.7 Red Hat OpenShift GitOps 1.5.7 openshift-gitops-operator.v1.5.6-0.1664915551.p Succeeded multicluster-engine openshift-pipelines-operator-rh.v1.6.4 Red Hat OpenShift Pipelines 1.6.4 openshift-pipelines-operator-rh.v1.6.3 Succeeded
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To access the container registry, copy a valid pull secret on the server to be installed:
Create the
.docker
folder:mkdir /root/.docker
$ mkdir /root/.docker
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy the valid pull in the
config.json
file to the previously created.docker/
folder:cp config.json /root/.docker/config.json
$ cp config.json /root/.docker/config.json
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
/root/.docker/config.json
is the default path wherepodman
checks for the login credentials for the registry.
If you use a different registry to pull the required artifacts, you need to copy the proper pull secret. If the local registry uses TLS, you need to include the certificates from the registry as well.
19.14.4.3. Downloading the OpenShift Container Platform images Copy linkLink copied to clipboard!
The factory-precaching-cli tool allows you to pre-cache all the container images required to provision a specific OpenShift Container Platform release.
Procedure
Pre-cache the release by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specifies the downloading function of the factory-precaching-cli tool.
- 2
- Defines the OpenShift Container Platform release version.
- 3
- Defines the RHACM version.
- 4
- Defines the multicluster engine version.
- 5
- Defines the folder where you want to download the images on the disk.
- 6
- Optional. Defines the repository where you store your additional images. These images are downloaded and pre-cached on the disk.
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check that all the images are compressed in the target folder of server:
ls -l /mnt
$ ls -l /mnt
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- It is recommended that you pre-cache the images in the
/mnt
folder.
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.14.4.4. Downloading the Operator images Copy linkLink copied to clipboard!
You can also pre-cache Day-2 Operators used in the 5G Radio Access Network (RAN) Distributed Unit (DU) cluster configuration. The Day-2 Operators depend on the installed OpenShift Container Platform version.
You need to include the RHACM hub and multicluster engine Operator versions by using the --acm-version
and --mce-version
flags so the factory-precaching-cli tool can pre-cache the appropriate containers images for RHACM and the multicluster engine Operator.
Procedure
Pre-cache the Operator images:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specifies the downloading function of the factory-precaching-cli tool.
- 2
- Defines the OpenShift Container Platform release version.
- 3
- Defines the RHACM version.
- 4
- Defines the multicluster engine version.
- 5
- Defines the folder where you want to download the images on the disk.
- 6
- Optional. Defines the repository where you store your additional images. These images are downloaded and pre-cached on the disk.
- 7
- Specifies pre-caching the Operators included in the DU configuration.
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.14.4.5. Pre-caching custom images in disconnected environments Copy linkLink copied to clipboard!
The --generate-imageset
argument stops the factory-precaching-cli tool after the ImageSetConfiguration
custom resource (CR) is generated. This allows you to customize the ImageSetConfiguration
CR before downloading any images. After you customized the CR, you can use the --skip-imageset
argument to download the images that you specified in the ImageSetConfiguration
CR.
You can customize the ImageSetConfiguration
CR in the following ways:
- Add Operators and additional images
- Remove Operators and additional images
- Change Operator and catalog sources to local or disconnected registries
Procedure
Pre-cache the images:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specifies the downloading function of the factory-precaching-cli tool.
- 2
- Defines the OpenShift Container Platform release version.
- 3
- Defines the RHACM version.
- 4
- Defines the multicluster engine version.
- 5
- Defines the folder where you want to download the images on the disk.
- 6
- Optional. Defines the repository where you store your additional images. These images are downloaded and pre-cached on the disk.
- 7
- Specifies pre-caching the Operators included in the DU configuration.
- 8
- The
--generate-imageset
argument generates theImageSetConfiguration
CR only, which allows you to customize the CR.
Example output
Generated /mnt/imageset.yaml
Generated /mnt/imageset.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example ImageSetConfiguration CR
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Customize the catalog resource in the CR:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When you download images by using a local or disconnected registry, you have to first add certificates for the registries that you want to pull the content from.
To avoid any errors, copy the registry certificate into your server:
cp /tmp/eko4-ca.crt /etc/pki/ca-trust/source/anchors/.
# cp /tmp/eko4-ca.crt /etc/pki/ca-trust/source/anchors/.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Then, update the certificates trust store:
update-ca-trust
# update-ca-trust
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Mount the host
/etc/pki
folder into the factory-cli image:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specifies the downloading function of the factory-precaching-cli tool.
- 2
- Defines the OpenShift Container Platform release version.
- 3
- Defines the RHACM version.
- 4
- Defines the multicluster engine version.
- 5
- Defines the folder where you want to download the images on the disk.
- 6
- Optional. Defines the repository where you store your additional images. These images are downloaded and pre-cached on the disk.
- 7
- Specifies pre-caching the Operators included in the DU configuration.
- 8
- The
--skip-imageset
argument allows you to download the images that you specified in your customizedImageSetConfiguration
CR.
Download the images without generating a new
imageSetConfiguration
CR:podman run -v /mnt:/mnt -v /root/.docker:/root/.docker --privileged --rm quay.io/openshift-kni/telco-ran-tools:latest -- factory-precaching-cli download -r 4.12.0 \ --acm-version 2.6.3 --mce-version 2.1.4 -f /mnt \ --img quay.io/custom/repository \ --du-profile -s \ --skip-imageset
# podman run -v /mnt:/mnt -v /root/.docker:/root/.docker --privileged --rm quay.io/openshift-kni/telco-ran-tools:latest -- factory-precaching-cli download -r 4.12.0 \ --acm-version 2.6.3 --mce-version 2.1.4 -f /mnt \ --img quay.io/custom/repository \ --du-profile -s \ --skip-imageset
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.14.5. Pre-caching images in ZTP Copy linkLink copied to clipboard!
The SiteConfig
manifest defines how an OpenShift cluster is to be installed and configured. In the ZTP provisioning workflow, the factory-precaching-cli tool requires the following additional fields in the SiteConfig
manifest:
-
clusters.ignitionConfigOverride
-
nodes.installerArgs
-
nodes.ignitionConfigOverride
Example SiteConfig with additional fields
19.14.5.1. Understanding the clusters.ignitionConfigOverride field Copy linkLink copied to clipboard!
The clusters.ignitionConfigOverride
field adds a configuration in Ignition format during the ZTP discovery stage. The configuration includes systemd
services in the ISO mounted in virtual media. This way, the scripts are part of the discovery RHCOS live ISO and they can be used to load the Assisted Installer (AI) images.
systemd
services-
The
systemd
services arevar-mnt.mount
andprecache-images.services
. Theprecache-images.service
depends on the disk partition to be mounted in/var/mnt
by thevar-mnt.mount
unit. The service calls a script calledextract-ai.sh
. extract-ai.sh
-
The
extract-ai.sh
script extracts and loads the required images from the disk partition to the local container storage. When the script finishes successfully, you can use the images locally. agent-fix-bz1964591
-
The
agent-fix-bz1964591
script is a workaround for an AI issue. To prevent AI from removing the images, which can force theagent.service
to pull the images again from the registry, theagent-fix-bz1964591
script checks if the requested container images exist.
19.14.5.2. Understanding the nodes.installerArgs field Copy linkLink copied to clipboard!
The nodes.installerArgs
field allows you to configure how the coreos-installer
utility writes the RHCOS live ISO to disk. You need to indicate to save the disk partition labeled as data
because the artifacts saved in the data
partition are needed during the OpenShift Container Platform installation stage.
The extra parameters are passed directly to the coreos-installer
utility that writes the live RHCOS to disk. On the next reboot, the operating system starts from the disk.
You can pass several options to the coreos-installer
utility:
19.14.5.3. Understanding the nodes.ignitionConfigOverride field Copy linkLink copied to clipboard!
Similarly to clusters.ignitionConfigOverride
, the nodes.ignitionConfigOverride
field allows the addtion of configurations in Ignition format to the coreos-installer
utility, but at the OpenShift Container Platform installation stage. When the RHCOS is written to disk, the extra configuration included in the ZTP discovery ISO is no longer available. During the discovery stage, the extra configuration is stored in the memory of the live OS.
At this stage, the number of container images extracted and loaded is bigger than in the discovery stage. Depending on the OpenShift Container Platform release and whether you install the Day-2 Operators, the installation time can vary.
At the installation stage, the var-mnt.mount
and precache-ocp.services
systemd
services are used.
precache-ocp.service
The
precache-ocp.service
depends on the disk partition to be mounted in/var/mnt
by thevar-mnt.mount
unit. Theprecache-ocp.service
service calls a script calledextract-ocp.sh
.ImportantTo extract all the images before the OpenShift Container Platform installation, you must execute
precache-ocp.service
before executing themachine-config-daemon-pull.service
andnodeip-configuration.service
services.extract-ocp.sh
-
The
extract-ocp.sh
script extracts and loads the required images from the disk partition to the local container storage. When the script finishes successfully, you can use the images locally.
When you upload the SiteConfig
and the optional PolicyGenTemplates
custom resources (CRs) to the Git repo, which Argo CD is monitoring, you can start the ZTP workflow by syncing the CRs with the hub cluster.
19.14.6. Troubleshooting Copy linkLink copied to clipboard!
19.14.6.1. Rendered catalog is invalid Copy linkLink copied to clipboard!
When you download images by using a local or disconnected registry, you might see the The rendered catalog is invalid
error. This means that you are missing certificates of the new registry you want to pull content from.
The factory-precaching-cli tool image is built on a UBI RHEL image. Certificate paths and locations are the same on RHCOS.
Example error
Procedure
Copy the registry certificate into your server:
cp /tmp/eko4-ca.crt /etc/pki/ca-trust/source/anchors/.
# cp /tmp/eko4-ca.crt /etc/pki/ca-trust/source/anchors/.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Update the certificates trust store:
update-ca-trust
# update-ca-trust
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Mount the host
/etc/pki
folder into the factory-cli image:podman run -v /mnt:/mnt -v /root/.docker:/root/.docker -v /etc/pki:/etc/pki --privileged -it --rm quay.io/openshift-kni/telco-ran-tools:latest -- \ factory-precaching-cli download -r 4.11.5 --acm-version 2.5.4 \ --mce-version 2.0.4 -f /mnt \--img quay.io/custom/repository
# podman run -v /mnt:/mnt -v /root/.docker:/root/.docker -v /etc/pki:/etc/pki --privileged -it --rm quay.io/openshift-kni/telco-ran-tools:latest -- \ factory-precaching-cli download -r 4.11.5 --acm-version 2.5.4 \ --mce-version 2.0.4 -f /mnt \--img quay.io/custom/repository --du-profile -s --skip-imageset
Copy to Clipboard Copied! Toggle word wrap Toggle overflow