Este conteúdo não está disponível no idioma selecionado.
Chapter 10. Managing cluster polices with PolicyGenTemplate resources
10.1. Configuring managed cluster policies by using PolicyGenTemplate resources
Applied Policy
custom resources (CRs) configure the managed clusters that you provision. You can customize how Red Hat Advanced Cluster Management (RHACM) uses PolicyGenTemplate
CRs to generate the applied Policy
CRs.
Using PolicyGenTemplate
CRs to manage and deploy polices to managed clusters will be deprecated in an upcoming OpenShift Container Platform release. Equivalent and improved functionality is available using Red Hat Advanced Cluster Management (RHACM) and PolicyGenerator
CRs.
For more information about PolicyGenerator
resources, see the RHACM Policy Generator documentation.
Additional resources
10.1.1. About the PolicyGenTemplate CRD
The PolicyGenTemplate
custom resource definition (CRD) tells the PolicyGen
policy generator what custom resources (CRs) to include in the cluster configuration, how to combine the CRs into the generated policies, and what items in those CRs need to be updated with overlay content.
The following example shows a PolicyGenTemplate
CR (common-du-ranGen.yaml
) extracted from the ztp-site-generate
reference container. The common-du-ranGen.yaml
file defines two Red Hat Advanced Cluster Management (RHACM) policies. The polices manage a collection of configuration CRs, one for each unique value of policyName
in the CR. common-du-ranGen.yaml
creates a single placement binding and a placement rule to bind the policies to clusters based on the labels listed in the spec.bindingRules
section.
Example PolicyGenTemplate CR - common-ranGen.yaml
apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "common-latest" namespace: "ztp-common" spec: bindingRules: common: "true" 1 du-profile: "latest" sourceFiles: 2 - fileName: SriovSubscriptionNS.yaml policyName: "subscriptions-policy" - fileName: SriovSubscriptionOperGroup.yaml policyName: "subscriptions-policy" - fileName: SriovSubscription.yaml policyName: "subscriptions-policy" - fileName: SriovOperatorStatus.yaml policyName: "subscriptions-policy" - fileName: PtpSubscriptionNS.yaml policyName: "subscriptions-policy" - fileName: PtpSubscriptionOperGroup.yaml policyName: "subscriptions-policy" - fileName: PtpSubscription.yaml policyName: "subscriptions-policy" - fileName: PtpOperatorStatus.yaml policyName: "subscriptions-policy" - fileName: ClusterLogNS.yaml policyName: "subscriptions-policy" - fileName: ClusterLogOperGroup.yaml policyName: "subscriptions-policy" - fileName: ClusterLogSubscription.yaml policyName: "subscriptions-policy" - fileName: ClusterLogOperatorStatus.yaml policyName: "subscriptions-policy" - fileName: StorageNS.yaml policyName: "subscriptions-policy" - fileName: StorageOperGroup.yaml policyName: "subscriptions-policy" - fileName: StorageSubscription.yaml policyName: "subscriptions-policy" - fileName: StorageOperatorStatus.yaml policyName: "subscriptions-policy" - fileName: DefaultCatsrc.yaml 3 policyName: "config-policy" 4 metadata: name: redhat-operators-disconnected spec: displayName: disconnected-redhat-operators image: registry.example.com:5000/disconnected-redhat-operators/disconnected-redhat-operator-index:v4.9 - fileName: DisconnectedICSP.yaml policyName: "config-policy" spec: repositoryDigestMirrors: - mirrors: - registry.example.com:5000 source: registry.redhat.io
- 1
common: "true"
applies the policies to all clusters with this label.- 2
- Files listed under
sourceFiles
create the Operator policies for installed clusters. - 3
DefaultCatsrc.yaml
configures the catalog source for the disconnected registry.- 4
policyName: "config-policy"
configures Operator subscriptions. TheOperatorHub
CR disables the default and this CR replacesredhat-operators
with aCatalogSource
CR that points to the disconnected registry.
A PolicyGenTemplate
CR can be constructed with any number of included CRs. Apply the following example CR in the hub cluster to generate a policy containing a single CR:
apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "group-du-sno" namespace: "ztp-group" spec: bindingRules: group-du-sno: "" mcp: "master" sourceFiles: - fileName: PtpConfigSlave.yaml policyName: "config-policy" metadata: name: "du-ptp-slave" spec: profile: - name: "slave" interface: "ens5f0" ptp4lOpts: "-2 -s --summary_interval -4" phc2sysOpts: "-a -r -n 24"
Using the source file PtpConfigSlave.yaml
as an example, the file defines a PtpConfig
CR. The generated policy for the PtpConfigSlave
example is named group-du-sno-config-policy
. The PtpConfig
CR defined in the generated group-du-sno-config-policy
is named du-ptp-slave
. The spec
defined in PtpConfigSlave.yaml
is placed under du-ptp-slave
along with the other spec
items defined under the source file.
The following example shows the group-du-sno-config-policy
CR:
apiVersion: policy.open-cluster-management.io/v1 kind: Policy metadata: name: group-du-ptp-config-policy namespace: groups-sub annotations: policy.open-cluster-management.io/categories: CM Configuration Management policy.open-cluster-management.io/controls: CM-2 Baseline Configuration policy.open-cluster-management.io/standards: NIST SP 800-53 spec: remediationAction: inform disabled: false policy-templates: - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: name: group-du-ptp-config-policy-config spec: remediationAction: inform severity: low namespaceselector: exclude: - kube-* include: - '*' object-templates: - complianceType: musthave objectDefinition: apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: du-ptp-slave namespace: openshift-ptp spec: recommend: - match: - nodeLabel: node-role.kubernetes.io/worker-du priority: 4 profile: slave profile: - interface: ens5f0 name: slave phc2sysOpts: -a -r -n 24 ptp4lConf: | [global] # # Default Data Set # twoStepFlag 1 slaveOnly 0 priority1 128 priority2 128 domainNumber 24
10.1.2. Recommendations when customizing PolicyGenTemplate CRs
Consider the following best practices when customizing site configuration PolicyGenTemplate
custom resources (CRs):
-
Use as few policies as are necessary. Using fewer policies requires less resources. Each additional policy creates increased CPU load for the hub cluster and the deployed managed cluster. CRs are combined into policies based on the
policyName
field in thePolicyGenTemplate
CR. CRs in the samePolicyGenTemplate
which have the same value forpolicyName
are managed under a single policy. -
In disconnected environments, use a single catalog source for all Operators by configuring the registry as a single index containing all Operators. Each additional
CatalogSource
CR on the managed clusters increases CPU usage. -
MachineConfig
CRs should be included asextraManifests
in theSiteConfig
CR so that they are applied during installation. This can reduce the overall time taken until the cluster is ready to deploy applications. -
PolicyGenTemplate
CRs should override the channel field to explicitly identify the desired version. This ensures that changes in the source CR during upgrades does not update the generated subscription.
Additional resources
- For recommendations about scaling clusters with RHACM, see Performance and scalability.
When managing large numbers of spoke clusters on the hub cluster, minimize the number of policies to reduce resource consumption.
Grouping multiple configuration CRs into a single or limited number of policies is one way to reduce the overall number of policies on the hub cluster. When using the common, group, and site hierarchy of policies for managing site configuration, it is especially important to combine site-specific configurations into a single policy.
10.1.3. PolicyGenTemplate CRs for RAN deployments
Use PolicyGenTemplate
custom resources (CRs) to customize the configuration applied to the cluster by using the GitOps Zero Touch Provisioning (ZTP) pipeline. The PolicyGenTemplate
CR allows you to generate one or more policies to manage the set of configuration CRs on your fleet of clusters. The PolicyGenTemplate
CR identifies the set of managed CRs, bundles them into policies, builds the policy wrapping around those CRs, and associates the policies with clusters by using label binding rules.
The reference configuration, obtained from the GitOps ZTP container, is designed to provide a set of critical features and node tuning settings that ensure the cluster can support the stringent performance and resource utilization constraints typical of RAN (Radio Access Network) Distributed Unit (DU) applications. Changes or omissions from the baseline configuration can affect feature availability, performance, and resource utilization. Use the reference PolicyGenTemplate
CRs as the basis to create a hierarchy of configuration files tailored to your specific site requirements.
The baseline PolicyGenTemplate
CRs that are defined for RAN DU cluster configuration can be extracted from the GitOps ZTP ztp-site-generate
container. See "Preparing the GitOps ZTP site configuration repository" for further details.
The PolicyGenTemplate
CRs can be found in the ./out/argocd/example/policygentemplates
folder. The reference architecture has common, group, and site-specific configuration CRs. Each PolicyGenTemplate
CR refers to other CRs that can be found in the ./out/source-crs
folder.
The PolicyGenTemplate
CRs relevant to RAN cluster configuration are described below. Variants are provided for the group PolicyGenTemplate
CRs to account for differences in single-node, three-node compact, and standard cluster configurations. Similarly, site-specific configuration variants are provided for single-node clusters and multi-node (compact or standard) clusters. Use the group and site-specific configuration variants that are relevant for your deployment.
PolicyGenTemplate CR | Description |
---|---|
| Contains a set of CRs that get applied to multi-node clusters. These CRs configure SR-IOV features typical for RAN installations. |
| Contains a set of CRs that get applied to single-node OpenShift clusters. These CRs configure SR-IOV features typical for RAN installations. |
| Contains a set of common RAN policy configuration that get applied to multi-node clusters. |
| Contains a set of common RAN CRs that get applied to all clusters. These CRs subscribe to a set of operators providing cluster features typical for RAN as well as baseline cluster tuning. |
| Contains the RAN policies for three-node clusters only. |
| Contains the RAN policies for single-node clusters only. |
| Contains the RAN policies for standard three control-plane clusters. |
|
|
|
|
|
|
Additional resources
10.1.4. Customizing a managed cluster with PolicyGenTemplate CRs
Use the following procedure to customize the policies that get applied to the managed cluster that you provision using the GitOps Zero Touch Provisioning (ZTP) pipeline.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You configured the hub cluster for generating the required installation and policy CRs.
- You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.
Procedure
Create a
PolicyGenTemplate
CR for site-specific configuration CRs.-
Choose the appropriate example for your CR from the
out/argocd/example/policygentemplates
folder, for example,example-sno-site.yaml
orexample-multinode-site.yaml
. Change the
spec.bindingRules
field in the example file to match the site-specific label included in theSiteConfig
CR. In the exampleSiteConfig
file, the site-specific label issites: example-sno
.NoteEnsure that the labels defined in your
PolicyGenTemplate
spec.bindingRules
field correspond to the labels that are defined in the related managed clustersSiteConfig
CR.- Change the content in the example file to match the desired configuration.
-
Choose the appropriate example for your CR from the
Optional: Create a
PolicyGenTemplate
CR for any common configuration CRs that apply to the entire fleet of clusters.-
Select the appropriate example for your CR from the
out/argocd/example/policygentemplates
folder, for example,common-ranGen.yaml
. - Change the content in the example file to match the required configuration.
-
Select the appropriate example for your CR from the
Optional: Create a
PolicyGenTemplate
CR for any group configuration CRs that apply to the certain groups of clusters in the fleet.Ensure that the content of the overlaid spec files matches your required end state. As a reference, the
out/source-crs
directory contains the full list of source-crs available to be included and overlaid by your PolicyGenTemplate templates.NoteDepending on the specific requirements of your clusters, you might need more than a single group policy per cluster type, especially considering that the example group policies each have a single
PerformancePolicy.yaml
file that can only be shared across a set of clusters if those clusters consist of identical hardware configurations.-
Select the appropriate example for your CR from the
out/argocd/example/policygentemplates
folder, for example,group-du-sno-ranGen.yaml
. - Change the content in the example file to match the required configuration.
-
Select the appropriate example for your CR from the
-
Optional. Create a validator inform policy
PolicyGenTemplate
CR to signal when the GitOps ZTP installation and configuration of the deployed cluster is complete. For more information, see "Creating a validator inform policy". Define all the policy namespaces in a YAML file similar to the example
out/argocd/example/policygentemplates/ns.yaml
file.ImportantDo not include the
Namespace
CR in the same file with thePolicyGenTemplate
CR.-
Add the
PolicyGenTemplate
CRs andNamespace
CR to thekustomization.yaml
file in the generators section, similar to the example shown inout/argocd/example/policygentemplateskustomization.yaml
. Commit the
PolicyGenTemplate
CRs,Namespace
CR, and associatedkustomization.yaml
file in your Git repository and push the changes.The ArgoCD pipeline detects the changes and begins the managed cluster deployment. You can push the changes to the
SiteConfig
CR and thePolicyGenTemplate
CR simultaneously.
Additional resources
10.1.5. Monitoring managed cluster policy deployment progress
The ArgoCD pipeline uses PolicyGenTemplate
CRs in Git to generate the RHACM policies and then sync them to the hub cluster. You can monitor the progress of the managed cluster policy synchronization after the assisted service installs OpenShift Container Platform on the managed cluster.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges.
Procedure
The Topology Aware Lifecycle Manager (TALM) applies the configuration policies that are bound to the cluster.
After the cluster installation is complete and the cluster becomes
Ready
, aClusterGroupUpgrade
CR corresponding to this cluster, with a list of ordered policies defined by theran.openshift.io/ztp-deploy-wave annotations
, is automatically created by the TALM. The cluster’s policies are applied in the order listed inClusterGroupUpgrade
CR.You can monitor the high-level progress of configuration policy reconciliation by using the following commands:
$ export CLUSTER=<clusterName>
$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[-1:]}' | jq
Example output
{ "lastTransitionTime": "2022-11-09T07:28:09Z", "message": "Remediating non-compliant policies", "reason": "InProgress", "status": "True", "type": "Progressing" }
You can monitor the detailed cluster policy compliance status by using the RHACM dashboard or the command line.
To check policy compliance by using
oc
, run the following command:$ oc get policies -n $CLUSTER
Example output
NAME REMEDIATION ACTION COMPLIANCE STATE AGE ztp-common.common-config-policy inform Compliant 3h42m ztp-common.common-subscriptions-policy inform NonCompliant 3h42m ztp-group.group-du-sno-config-policy inform NonCompliant 3h42m ztp-group.group-du-sno-validator-du-policy inform NonCompliant 3h42m ztp-install.example1-common-config-policy-pjz9s enforce Compliant 167m ztp-install.example1-common-subscriptions-policy-zzd9k enforce NonCompliant 164m ztp-site.example1-config-policy inform NonCompliant 3h42m ztp-site.example1-perf-policy inform NonCompliant 3h42m
To check policy status from the RHACM web console, perform the following actions:
-
Click Governance
Find policies. - Click on a cluster policy to check its status.
-
Click Governance
When all of the cluster policies become compliant, GitOps ZTP installation and configuration for the cluster is complete. The ztp-done
label is added to the cluster.
In the reference configuration, the final policy that becomes compliant is the one defined in the *-du-validator-policy
policy. This policy, when compliant on a cluster, ensures that all cluster configuration, Operator installation, and Operator configuration is complete.
10.1.6. Validating the generation of configuration policy CRs
Policy
custom resources (CRs) are generated in the same namespace as the PolicyGenTemplate
from which they are created. The same troubleshooting flow applies to all policy CRs generated from a PolicyGenTemplate
regardless of whether they are ztp-common
, ztp-group
, or ztp-site
based, as shown using the following commands:
$ export NS=<namespace>
$ oc get policy -n $NS
The expected set of policy-wrapped CRs should be displayed.
If the policies failed synchronization, use the following troubleshooting steps.
Procedure
To display detailed information about the policies, run the following command:
$ oc describe -n openshift-gitops application policies
Check for
Status: Conditions:
to show the error logs. For example, setting an invalidsourceFile
entry tofileName:
generates the error shown below:Status: Conditions: Last Transition Time: 2021-11-26T17:21:39Z Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/policies/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not find test.yaml under source-crs/: no such file or directory Error: failure in plugin configured via /tmp/kust-plugin-config-52463179; exit status 1: exit status 1 Type: ComparisonError
Check for
Status: Sync:
. If there are log errors atStatus: Conditions:
, theStatus: Sync:
showsUnknown
orError
:Status: Sync: Compared To: Destination: Namespace: policies-sub Server: https://kubernetes.default.svc Source: Path: policies Repo URL: https://git.com/ran-sites/policies/.git Target Revision: master Status: Error
When Red Hat Advanced Cluster Management (RHACM) recognizes that policies apply to a
ManagedCluster
object, the policy CR objects are applied to the cluster namespace. Check to see if the policies were copied to the cluster namespace:$ oc get policy -n $CLUSTER
Example output
NAME REMEDIATION ACTION COMPLIANCE STATE AGE ztp-common.common-config-policy inform Compliant 13d ztp-common.common-subscriptions-policy inform Compliant 13d ztp-group.group-du-sno-config-policy inform Compliant 13d ztp-group.group-du-sno-validator-du-policy inform Compliant 13d ztp-site.example-sno-config-policy inform Compliant 13d
RHACM copies all applicable policies into the cluster namespace. The copied policy names have the format:
<PolicyGenTemplate.Namespace>.<PolicyGenTemplate.Name>-<policyName>
.Check the placement rule for any policies not copied to the cluster namespace. The
matchSelector
in thePlacementRule
for those policies should match labels on theManagedCluster
object:$ oc get PlacementRule -n $NS
Note the
PlacementRule
name appropriate for the missing policy, common, group, or site, using the following command:$ oc get PlacementRule -n $NS <placement_rule_name> -o yaml
- The status-decisions should include your cluster name.
-
The key-value pair of the
matchSelector
in the spec must match the labels on your managed cluster.
Check the labels on the
ManagedCluster
object by using the following command:$ oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq
Check to see what policies are compliant by using the following command:
$ oc get policy -n $CLUSTER
If the
Namespace
,OperatorGroup
, andSubscription
policies are compliant but the Operator configuration policies are not, it is likely that the Operators did not install on the managed cluster. This causes the Operator configuration policies to fail to apply because the CRD is not yet applied to the spoke.
10.1.7. Restarting policy reconciliation
You can restart policy reconciliation when unexpected compliance issues occur, for example, when the ClusterGroupUpgrade
custom resource (CR) has timed out.
Procedure
A
ClusterGroupUpgrade
CR is generated in the namespaceztp-install
by the Topology Aware Lifecycle Manager after the managed cluster becomesReady
:$ export CLUSTER=<clusterName>
$ oc get clustergroupupgrades -n ztp-install $CLUSTER
If there are unexpected issues and the policies fail to become complaint within the configured timeout (the default is 4 hours), the status of the
ClusterGroupUpgrade
CR showsUpgradeTimedOut
:$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
A
ClusterGroupUpgrade
CR in theUpgradeTimedOut
state automatically restarts its policy reconciliation every hour. If you have changed your policies, you can start a retry immediately by deleting the existingClusterGroupUpgrade
CR. This triggers the automatic creation of a newClusterGroupUpgrade
CR that begins reconciling the policies immediately:$ oc delete clustergroupupgrades -n ztp-install $CLUSTER
Note that when the ClusterGroupUpgrade
CR completes with status UpgradeCompleted
and the managed cluster has the label ztp-done
applied, you can make additional configuration changes by using PolicyGenTemplate
. Deleting the existing ClusterGroupUpgrade
CR will not make the TALM generate a new CR.
At this point, GitOps ZTP has completed its interaction with the cluster and any further interactions should be treated as an update and a new ClusterGroupUpgrade
CR created for remediation of the policies.
Additional resources
-
For information about using Topology Aware Lifecycle Manager (TALM) to construct your own
ClusterGroupUpgrade
CR, see About the ClusterGroupUpgrade CR.
10.1.8. Changing applied managed cluster CRs using policies
You can remove content from a custom resource (CR) that is deployed in a managed cluster through a policy.
By default, all Policy
CRs created from a PolicyGenTemplate
CR have the complianceType
field set to musthave
. A musthave
policy without the removed content is still compliant because the CR on the managed cluster has all the specified content. With this configuration, when you remove content from a CR, TALM removes the content from the policy but the content is not removed from the CR on the managed cluster.
With the complianceType
field to mustonlyhave
, the policy ensures that the CR on the cluster is an exact match of what is specified in the policy.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have deployed a managed cluster from a hub cluster running RHACM.
- You have installed Topology Aware Lifecycle Manager on the hub cluster.
Procedure
Remove the content that you no longer need from the affected CRs. In this example, the
disableDrain: false
line was removed from theSriovOperatorConfig
CR.Example CR
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: configDaemonNodeSelector: "node-role.kubernetes.io/$mcp": "" disableDrain: true enableInjector: true enableOperatorWebhook: true
Change the
complianceType
of the affected policies tomustonlyhave
in thegroup-du-sno-ranGen.yaml
file.Example YAML
- fileName: SriovOperatorConfig.yaml policyName: "config-policy" complianceType: mustonlyhave
Create a
ClusterGroupUpdates
CR and specify the clusters that must receive the CR changes::Example ClusterGroupUpdates CR
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-remove namespace: default spec: managedPolicies: - ztp-group.group-du-sno-config-policy enable: false clusters: - spoke1 - spoke2 remediationStrategy: maxConcurrency: 2 timeout: 240 batchTimeoutAction:
Create the
ClusterGroupUpgrade
CR by running the following command:$ oc create -f cgu-remove.yaml
When you are ready to apply the changes, for example, during an appropriate maintenance window, change the value of the
spec.enable
field totrue
by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-remove \ --patch '{"spec":{"enable":true}}' --type=merge
Verification
Check the status of the policies by running the following command:
$ oc get <kind> <changed_cr_name>
Example output
NAMESPACE NAME REMEDIATION ACTION COMPLIANCE STATE AGE default cgu-ztp-group.group-du-sno-config-policy enforce 17m default ztp-group.group-du-sno-config-policy inform NonCompliant 15h
When the
COMPLIANCE STATE
of the policy isCompliant
, it means that the CR is updated and the unwanted content is removed.Check that the policies are removed from the targeted clusters by running the following command on the managed clusters:
$ oc get <kind> <changed_cr_name>
If there are no results, the CR is removed from the managed cluster.
10.1.9. Indication of done for GitOps ZTP installations
GitOps Zero Touch Provisioning (ZTP) simplifies the process of checking the GitOps ZTP installation status for a cluster. The GitOps ZTP status moves through three phases: cluster installation, cluster configuration, and GitOps ZTP done.
- Cluster installation phase
-
The cluster installation phase is shown by the
ManagedClusterJoined
andManagedClusterAvailable
conditions in theManagedCluster
CR . If theManagedCluster
CR does not have these conditions, or the condition is set toFalse
, the cluster is still in the installation phase. Additional details about installation are available from theAgentClusterInstall
andClusterDeployment
CRs. For more information, see "Troubleshooting GitOps ZTP". - Cluster configuration phase
-
The cluster configuration phase is shown by a
ztp-running
label applied theManagedCluster
CR for the cluster. - GitOps ZTP done
Cluster installation and configuration is complete in the GitOps ZTP done phase. This is shown by the removal of the
ztp-running
label and addition of theztp-done
label to theManagedCluster
CR. Theztp-done
label shows that the configuration has been applied and the baseline DU configuration has completed cluster tuning.The change to the GitOps ZTP done state is conditional on the compliant state of a Red Hat Advanced Cluster Management (RHACM) validator inform policy. This policy captures the existing criteria for a completed installation and validates that it moves to a compliant state only when GitOps ZTP provisioning of the managed cluster is complete.
The validator inform policy ensures the configuration of the cluster is fully applied and Operators have completed their initialization. The policy validates the following:
-
The target
MachineConfigPool
contains the expected entries and has finished updating. All nodes are available and not degraded. -
The SR-IOV Operator has completed initialization as indicated by at least one
SriovNetworkNodeState
withsyncStatus: Succeeded
. - The PTP Operator daemon set exists.
-
The target
10.2. Advanced managed cluster configuration with PolicyGenTemplate resources
You can use PolicyGenTemplate
CRs to deploy custom functionality in your managed clusters.
Using PolicyGenTemplate
CRs to manage and deploy polices to managed clusters will be deprecated in an upcoming OpenShift Container Platform release. Equivalent and improved functionality is available using Red Hat Advanced Cluster Management (RHACM) and PolicyGenerator
CRs.
For more information about PolicyGenerator
resources, see the RHACM Policy Generator documentation.
Additional resources
10.2.1. Deploying additional changes to clusters
If you require cluster configuration changes outside of the base GitOps Zero Touch Provisioning (ZTP) pipeline configuration, there are three options:
- Apply the additional configuration after the GitOps ZTP pipeline is complete
- When the GitOps ZTP pipeline deployment is complete, the deployed cluster is ready for application workloads. At this point, you can install additional Operators and apply configurations specific to your requirements. Ensure that additional configurations do not negatively affect the performance of the platform or allocated CPU budget.
- Add content to the GitOps ZTP library
- The base source custom resources (CRs) that you deploy with the GitOps ZTP pipeline can be augmented with custom content as required.
- Create extra manifests for the cluster installation
- Extra manifests are applied during installation and make the installation process more efficient.
Providing additional source CRs or modifying existing source CRs can significantly impact the performance or CPU profile of OpenShift Container Platform.
10.2.2. Using PolicyGenTemplate CRs to override source CRs content
PolicyGenTemplate
custom resources (CRs) allow you to overlay additional configuration details on top of the base source CRs provided with the GitOps plugin in the ztp-site-generate
container. You can think of PolicyGenTemplate
CRs as a logical merge or patch to the base CR. Use PolicyGenTemplate
CRs to update a single field of the base CR, or overlay the entire contents of the base CR. You can update values and insert fields that are not in the base CR.
The following example procedure describes how to update fields in the generated PerformanceProfile
CR for the reference configuration based on the PolicyGenTemplate
CR in the group-du-sno-ranGen.yaml
file. Use the procedure as a basis for modifying other parts of the PolicyGenTemplate
based on your requirements.
Prerequisites
- Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for Argo CD.
Procedure
Review the baseline source CR for existing content. You can review the source CRs listed in the reference
PolicyGenTemplate
CRs by extracting them from the GitOps Zero Touch Provisioning (ZTP) container.Create an
/out
folder:$ mkdir -p ./out
Extract the source CRs:
$ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.16.1 extract /home/ztp --tar | tar x -C ./out
Review the baseline
PerformanceProfile
CR in./out/source-crs/PerformanceProfile.yaml
:apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: $name annotations: ran.openshift.io/ztp-deploy-wave: "10" spec: additionalKernelArgs: - "idle=poll" - "rcupdate.rcu_normal_after_boot=0" cpu: isolated: $isolated reserved: $reserved hugepages: defaultHugepagesSize: $defaultHugepagesSize pages: - size: $size count: $count node: $node machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/$mcp: "" net: userLevelNetworking: true nodeSelector: node-role.kubernetes.io/$mcp: '' numa: topologyPolicy: "restricted" realTimeKernel: enabled: true
NoteAny fields in the source CR which contain
$…
are removed from the generated CR if they are not provided in thePolicyGenTemplate
CR.Update the
PolicyGenTemplate
entry forPerformanceProfile
in thegroup-du-sno-ranGen.yaml
reference file. The following examplePolicyGenTemplate
CR stanza supplies appropriate CPU specifications, sets thehugepages
configuration, and adds a new field that setsgloballyDisableIrqLoadBalancing
to false.- fileName: PerformanceProfile.yaml policyName: "config-policy" metadata: name: openshift-node-performance-profile spec: cpu: # These must be tailored for the specific hardware platform isolated: "2-19,22-39" reserved: "0-1,20-21" hugepages: defaultHugepagesSize: 1G pages: - size: 1G count: 10 globallyDisableIrqLoadBalancing: false
Commit the
PolicyGenTemplate
change in Git, and then push to the Git repository being monitored by the GitOps ZTP argo CD application.Example output
The GitOps ZTP application generates an RHACM policy that contains the generated
PerformanceProfile
CR. The contents of that CR are derived by merging themetadata
andspec
contents from thePerformanceProfile
entry in thePolicyGenTemplate
onto the source CR. The resulting CR has the following content:--- apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: openshift-node-performance-profile spec: additionalKernelArgs: - idle=poll - rcupdate.rcu_normal_after_boot=0 cpu: isolated: 2-19,22-39 reserved: 0-1,20-21 globallyDisableIrqLoadBalancing: false hugepages: defaultHugepagesSize: 1G pages: - count: 10 size: 1G machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" net: userLevelNetworking: true nodeSelector: node-role.kubernetes.io/master: "" numa: topologyPolicy: restricted realTimeKernel: enabled: true
In the /source-crs
folder that you extract from the ztp-site-generate
container, the $
syntax is not used for template substitution as implied by the syntax. Rather, if the policyGen
tool sees the $
prefix for a string and you do not specify a value for that field in the related PolicyGenTemplate
CR, the field is omitted from the output CR entirely.
An exception to this is the $mcp
variable in /source-crs
YAML files that is substituted with the specified value for mcp
from the PolicyGenTemplate
CR. For example, in example/policygentemplates/group-du-standard-ranGen.yaml
, the value for mcp
is worker
:
spec: bindingRules: group-du-standard: "" mcp: "worker"
The policyGen
tool replace instances of $mcp
with worker
in the output CRs.
10.2.3. Adding custom content to the GitOps ZTP pipeline
Perform the following procedure to add new content to the GitOps ZTP pipeline.
Procedure
-
Create a subdirectory named
source-crs
in the directory that contains thekustomization.yaml
file for thePolicyGenTemplate
custom resource (CR). Add your user-provided CRs to the
source-crs
subdirectory, as shown in the following example:example └── policygentemplates ├── dev.yaml ├── kustomization.yaml ├── mec-edge-sno1.yaml ├── sno.yaml └── source-crs 1 ├── PaoCatalogSource.yaml ├── PaoSubscription.yaml ├── custom-crs | ├── apiserver-config.yaml | └── disable-nic-lldp.yaml └── elasticsearch ├── ElasticsearchNS.yaml └── ElasticsearchOperatorGroup.yaml
- 1
- The
source-crs
subdirectory must be in the same directory as thekustomization.yaml
file.
Update the required
PolicyGenTemplate
CRs to include references to the content you added in thesource-crs/custom-crs
andsource-crs/elasticsearch
directories. For example:apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "group-dev" namespace: "ztp-clusters" spec: bindingRules: dev: "true" mcp: "master" sourceFiles: # These policies/CRs come from the internal container Image #Cluster Logging - fileName: ClusterLogNS.yaml remediationAction: inform policyName: "group-dev-cluster-log-ns" - fileName: ClusterLogOperGroup.yaml remediationAction: inform policyName: "group-dev-cluster-log-operator-group" - fileName: ClusterLogSubscription.yaml remediationAction: inform policyName: "group-dev-cluster-log-sub" #Local Storage Operator - fileName: StorageNS.yaml remediationAction: inform policyName: "group-dev-lso-ns" - fileName: StorageOperGroup.yaml remediationAction: inform policyName: "group-dev-lso-operator-group" - fileName: StorageSubscription.yaml remediationAction: inform policyName: "group-dev-lso-sub" #These are custom local polices that come from the source-crs directory in the git repo # Performance Addon Operator - fileName: PaoSubscriptionNS.yaml remediationAction: inform policyName: "group-dev-pao-ns" - fileName: PaoSubscriptionCatalogSource.yaml remediationAction: inform policyName: "group-dev-pao-cat-source" spec: image: <container_image_url> - fileName: PaoSubscription.yaml remediationAction: inform policyName: "group-dev-pao-sub" #Elasticsearch Operator - fileName: elasticsearch/ElasticsearchNS.yaml 1 remediationAction: inform policyName: "group-dev-elasticsearch-ns" - fileName: elasticsearch/ElasticsearchOperatorGroup.yaml remediationAction: inform policyName: "group-dev-elasticsearch-operator-group" #Custom Resources - fileName: custom-crs/apiserver-config.yaml 2 remediationAction: inform policyName: "group-dev-apiserver-config" - fileName: custom-crs/disable-nic-lldp.yaml remediationAction: inform policyName: "group-dev-disable-nic-lldp"
-
Commit the
PolicyGenTemplate
change in Git, and then push to the Git repository that is monitored by the GitOps ZTP Argo CD policies application. Update the
ClusterGroupUpgrade
CR to include the changedPolicyGenTemplate
and save it ascgu-test.yaml
. The following example shows a generatedcgu-test.yaml
file.apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: custom-source-cr namespace: ztp-clusters spec: managedPolicies: - group-dev-config-policy enable: true clusters: - cluster1 remediationStrategy: maxConcurrency: 2 timeout: 240
Apply the updated
ClusterGroupUpgrade
CR by running the following command:$ oc apply -f cgu-test.yaml
Verification
Check that the updates have succeeded by running the following command:
$ oc get cgu -A
Example output
NAMESPACE NAME AGE STATE DETAILS ztp-clusters custom-source-cr 6s InProgress Remediating non-compliant policies ztp-install cluster1 19h Completed All clusters are compliant with all the managed policies
10.2.4. Configuring policy compliance evaluation timeouts for PolicyGenTemplate CRs
Use Red Hat Advanced Cluster Management (RHACM) installed on a hub cluster to monitor and report on whether your managed clusters are compliant with applied policies. RHACM uses policy templates to apply predefined policy controllers and policies. Policy controllers are Kubernetes custom resource definition (CRD) instances.
You can override the default policy evaluation intervals with PolicyGenTemplate
custom resources (CRs). You configure duration settings that define how long a ConfigurationPolicy
CR can be in a state of policy compliance or non-compliance before RHACM re-evaluates the applied cluster policies.
The GitOps Zero Touch Provisioning (ZTP) policy generator generates ConfigurationPolicy
CR policies with pre-defined policy evaluation intervals. The default value for the noncompliant
state is 10 seconds. The default value for the compliant
state is 10 minutes. To disable the evaluation interval, set the value to never
.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data.
Procedure
To configure the evaluation interval for all policies in a
PolicyGenTemplate
CR, set appropriatecompliant
andnoncompliant
values for theevaluationInterval
field. For example:spec: evaluationInterval: compliant: 30m noncompliant: 20s
NoteYou can also set
compliant
andnoncompliant
fields tonever
to stop evaluating the policy after it reaches particular compliance state.To configure the evaluation interval for an individual policy object in a
PolicyGenTemplate
CR, add theevaluationInterval
field and set appropriate values. For example:spec: sourceFiles: - fileName: SriovSubscription.yaml policyName: "sriov-sub-policy" evaluationInterval: compliant: never noncompliant: 10s
-
Commit the
PolicyGenTemplate
CRs files in the Git repository and push your changes.
Verification
Check that the managed spoke cluster policies are monitored at the expected intervals.
-
Log in as a user with
cluster-admin
privileges on the managed cluster. Get the pods that are running in the
open-cluster-management-agent-addon
namespace. Run the following command:$ oc get pods -n open-cluster-management-agent-addon
Example output
NAME READY STATUS RESTARTS AGE config-policy-controller-858b894c68-v4xdb 1/1 Running 22 (5d8h ago) 10d
Check the applied policies are being evaluated at the expected interval in the logs for the
config-policy-controller
pod:$ oc logs -n open-cluster-management-agent-addon config-policy-controller-858b894c68-v4xdb
Example output
2022-05-10T15:10:25.280Z info configuration-policy-controller controllers/configurationpolicy_controller.go:166 Skipping the policy evaluation due to the policy not reaching the evaluation interval {"policy": "compute-1-config-policy-config"} 2022-05-10T15:10:25.280Z info configuration-policy-controller controllers/configurationpolicy_controller.go:166 Skipping the policy evaluation due to the policy not reaching the evaluation interval {"policy": "compute-1-common-compute-1-catalog-policy-config"}
10.2.5. Signalling GitOps ZTP cluster deployment completion with validator inform policies
Create a validator inform policy that signals when the GitOps Zero Touch Provisioning (ZTP) installation and configuration of the deployed cluster is complete. This policy can be used for deployments of single-node OpenShift clusters, three-node clusters, and standard clusters.
Procedure
Create a standalone
PolicyGenTemplate
custom resource (CR) that contains the source filevalidatorCRs/informDuValidator.yaml
. You only need one standalonePolicyGenTemplate
CR for each cluster type. For example, this CR applies a validator inform policy for single-node OpenShift clusters:Example single-node cluster validator inform policy CR (group-du-sno-validator-ranGen.yaml)
apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "group-du-sno-validator" 1 namespace: "ztp-group" 2 spec: bindingRules: group-du-sno: "" 3 bindingExcludedRules: ztp-done: "" 4 mcp: "master" 5 sourceFiles: - fileName: validatorCRs/informDuValidator.yaml remediationAction: inform 6 policyName: "du-policy" 7
- 1
- The name of the
{policy-gen-crs}
object. This name is also used as part of the names for theplacementBinding
,placementRule
, andpolicy
that are created in the requestednamespace
. - 2
- This value should match the
namespace
used in the grouppolicy-gen-crs
. - 3
- The
group-du-*
label defined inbindingRules
must exist in theSiteConfig
files. - 4
- The label defined in
bindingExcludedRules
must be`ztp-done:`. Theztp-done
label is used in coordination with the Topology Aware Lifecycle Manager. - 5
mcp
defines theMachineConfigPool
object that is used in the source filevalidatorCRs/informDuValidator.yaml
. It should bemaster
for single node and three-node cluster deployments andworker
for standard cluster deployments.- 6
- Optional. The default value is
inform
. - 7
- This value is used as part of the name for the generated RHACM policy. The generated validator policy for the single node example is
group-du-sno-validator-du-policy
.
-
Commit the
PolicyGenTemplate
CR file in your Git repository and push the changes.
Additional resources
10.2.6. Configuring power states using PolicyGenTemplate CRs
For low latency and high-performance edge deployments, it is necessary to disable or limit C-states and P-states. With this configuration, the CPU runs at a constant frequency, which is typically the maximum turbo frequency. This ensures that the CPU is always running at its maximum speed, which results in high performance and low latency. This leads to the best latency for workloads. However, this also leads to the highest power consumption, which might not be necessary for all workloads.
Workloads can be classified as critical or non-critical, with critical workloads requiring disabled C-state and P-state settings for high performance and low latency, while non-critical workloads use C-state and P-state settings for power savings at the expense of some latency and performance. You can configure the following three power states using GitOps Zero Touch Provisioning (ZTP):
- High-performance mode provides ultra low latency at the highest power consumption.
- Performance mode provides low latency at a relatively high power consumption.
- Power saving balances reduced power consumption with increased latency.
The default configuration is for a low latency, performance mode.
PolicyGenTemplate
custom resources (CRs) allow you to overlay additional configuration details onto the base source CRs provided with the GitOps plugin in the ztp-site-generate
container.
Configure the power states by updating the workloadHints
fields in the generated PerformanceProfile
CR for the reference configuration, based on the PolicyGenTemplate
CR in the group-du-sno-ranGen.yaml
.
The following common prerequisites apply to configuring all three power states.
Prerequisites
- You have created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for Argo CD.
- You have followed the procedure described in "Preparing the GitOps ZTP site configuration repository".
Additional resources
10.2.6.1. Configuring performance mode using PolicyGenTemplate CRs
Follow this example to set performance mode by updating the workloadHints
fields in the generated PerformanceProfile
CR for the reference configuration, based on the PolicyGenTemplate
CR in the group-du-sno-ranGen.yaml
.
Performance mode provides low latency at a relatively high power consumption.
Prerequisites
- You have configured the BIOS with performance related settings by following the guidance in "Configuring host firmware for low latency and high performance".
Procedure
Update the
PolicyGenTemplate
entry forPerformanceProfile
in thegroup-du-sno-ranGen.yaml
reference file inout/argocd/example/policygentemplates//
as follows to set performance mode.- fileName: PerformanceProfile.yaml policyName: "config-policy" metadata: # ... spec: # ... workloadHints: realTime: true highPowerConsumption: false perPodPowerManagement: false
-
Commit the
PolicyGenTemplate
change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.
10.2.6.2. Configuring high-performance mode using PolicyGenTemplate CRs
Follow this example to set high performance mode by updating the workloadHints
fields in the generated PerformanceProfile
CR for the reference configuration, based on the PolicyGenTemplate
CR in the group-du-sno-ranGen.yaml
.
High performance mode provides ultra low latency at the highest power consumption.
Prerequisites
- You have configured the BIOS with performance related settings by following the guidance in "Configuring host firmware for low latency and high performance".
Procedure
Update the
PolicyGenTemplate
entry forPerformanceProfile
in thegroup-du-sno-ranGen.yaml
reference file inout/argocd/example/policygentemplates/
as follows to set high-performance mode.- fileName: PerformanceProfile.yaml policyName: "config-policy" metadata: # ... spec: # ... workloadHints: realTime: true highPowerConsumption: true perPodPowerManagement: false
-
Commit the
PolicyGenTemplate
change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.
10.2.6.3. Configuring power saving mode using PolicyGenTemplate CRs
Follow this example to set power saving mode by updating the workloadHints
fields in the generated PerformanceProfile
CR for the reference configuration, based on the PolicyGenTemplate
CR in the group-du-sno-ranGen.yaml
.
The power saving mode balances reduced power consumption with increased latency.
Prerequisites
- You enabled C-states and OS-controlled P-states in the BIOS.
Procedure
Update the
PolicyGenTemplate
entry forPerformanceProfile
in thegroup-du-sno-ranGen.yaml
reference file inout/argocd/example/policygentemplates/
as follows to configure power saving mode. It is recommended to configure the CPU governor for the power saving mode through the additional kernel arguments object.- fileName: PerformanceProfile.yaml policyName: "config-policy" metadata: # ... spec: # ... workloadHints: realTime: true highPowerConsumption: false perPodPowerManagement: true # ... additionalKernelArgs: - # ... - "cpufreq.default_governor=schedutil" 1
- 1
- The
schedutil
governor is recommended, however, other governors that can be used includeondemand
andpowersave
.
-
Commit the
PolicyGenTemplate
change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.
Verification
Select a worker node in your deployed cluster from the list of nodes identified by using the following command:
$ oc get nodes
Log in to the node by using the following command:
$ oc debug node/<node-name>
Replace
<node-name>
with the name of the node you want to verify the power state on.Set
/host
as the root directory within the debug shell. The debug pod mounts the host’s root file system in/host
within the pod. By changing the root directory to/host
, you can run binaries contained in the host’s executable paths as shown in the following example:# chroot /host
Run the following command to verify the applied power state:
# cat /proc/cmdline
Expected output
-
For power saving mode the
intel_pstate=passive
.
10.2.6.4. Maximizing power savings
Limiting the maximum CPU frequency is recommended to achieve maximum power savings. Enabling C-states on the non-critical workload CPUs without restricting the maximum CPU frequency negates much of the power savings by boosting the frequency of the critical CPUs.
Maximize power savings by updating the sysfs
plugin fields, setting an appropriate value for max_perf_pct
in the TunedPerformancePatch
CR for the reference configuration. This example based on the group-du-sno-ranGen.yaml
describes the procedure to follow to restrict the maximum CPU frequency.
Prerequisites
- You have configured power savings mode as described in "Using PolicyGenTemplate CRs to configure power savings mode".
Procedure
Update the
PolicyGenTemplate
entry forTunedPerformancePatch
in thegroup-du-sno-ranGen.yaml
reference file inout/argocd/example/policygentemplates/
. To maximize power savings, addmax_perf_pct
as shown in the following example:- fileName: TunedPerformancePatch.yaml policyName: "config-policy" spec: profile: - name: performance-patch data: | # ... [sysfs] /sys/devices/system/cpu/intel_pstate/max_perf_pct=<x> 1
- 1
- The
max_perf_pct
controls the maximum frequency thecpufreq
driver is allowed to set as a percentage of the maximum supported CPU frequency. This value applies to all CPUs. You can check the maximum supported frequency in/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
. As a starting point, you can use a percentage that caps all CPUs at theAll Cores Turbo
frequency. TheAll Cores Turbo
frequency is the frequency that all cores will run at when the cores are all fully occupied.
NoteTo maximize power savings, set a lower value. Setting a lower value for
max_perf_pct
limits the maximum CPU frequency, thereby reducing power consumption, but also potentially impacting performance. Experiment with different values and monitor the system’s performance and power consumption to find the optimal setting for your use-case.-
Commit the
PolicyGenTemplate
change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.
10.2.7. Configuring LVM Storage using PolicyGenTemplate CRs
You can configure Logical Volume Manager (LVM) Storage for managed clusters that you deploy with GitOps Zero Touch Provisioning (ZTP).
You use LVM Storage to persist event subscriptions when you use PTP events or bare-metal hardware events with HTTP transport.
Use the Local Storage Operator for persistent storage that uses local volumes in distributed units.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Create a Git repository where you manage your custom site configuration data.
Procedure
To configure LVM Storage for new managed clusters, add the following YAML to
spec.sourceFiles
in thecommon-ranGen.yaml
file:- fileName: StorageLVMOSubscriptionNS.yaml policyName: subscription-policies - fileName: StorageLVMOSubscriptionOperGroup.yaml policyName: subscription-policies - fileName: StorageLVMOSubscription.yaml spec: name: lvms-operator channel: stable-4.16 policyName: subscription-policies
NoteThe Storage LVMO subscription is deprecated. In future releases of OpenShift Container Platform, the storage LVMO subscription will not be available. Instead, you must use the Storage LVMS subscription.
In OpenShift Container Platform 4.16, you can use the Storage LVMS subscription instead of the LVMO subscription. The LVMS subscription does not require manual overrides in the
common-ranGen.yaml
file. Add the following YAML tospec.sourceFiles
in thecommon-ranGen.yaml
file to use the Storage LVMS subscription:- fileName: StorageLVMSubscriptionNS.yaml policyName: subscription-policies - fileName: StorageLVMSubscriptionOperGroup.yaml policyName: subscription-policies - fileName: StorageLVMSubscription.yaml policyName: subscription-policies
Add the
LVMCluster
CR tospec.sourceFiles
in your specific group or individual site configuration file. For example, in thegroup-du-sno-ranGen.yaml
file, add the following:- fileName: StorageLVMCluster.yaml policyName: "lvms-config" spec: storage: deviceClasses: - name: vg1 thinPoolConfig: name: thin-pool-1 sizePercent: 90 overprovisionRatio: 10
This example configuration creates a volume group (
vg1
) with all the available devices, except the disk where OpenShift Container Platform is installed. A thin-pool logical volume is also created.- Merge any other required changes and files with your custom site repository.
-
Commit the
PolicyGenTemplate
changes in Git, and then push the changes to your site configuration repository to deploy LVM Storage to new sites using GitOps ZTP.
10.2.8. Configuring PTP events with PolicyGenTemplate CRs
You can use the GitOps ZTP pipeline to configure PTP events that use HTTP transport.
10.2.8.1. Configuring PTP events that use HTTP transport
You can configure PTP events that use HTTP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data.
Procedure
Apply the following
PolicyGenTemplate
changes togroup-du-3node-ranGen.yaml
,group-du-sno-ranGen.yaml
, orgroup-du-standard-ranGen.yaml
files according to your requirements:In
spec.sourceFiles
, add thePtpOperatorConfig
CR file that configures the transport host:- fileName: PtpOperatorConfigForEvent.yaml policyName: "config-policy" spec: daemonNodeSelector: {} ptpEventConfig: enableEventPublisher: true transportHost: http://ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043
NoteIn OpenShift Container Platform 4.13 or later, you do not need to set the
transportHost
field in thePtpOperatorConfig
resource when you use HTTP transport with PTP events.Configure the
linuxptp
andphc2sys
for the PTP clock type and interface. For example, add the following YAML intospec.sourceFiles
:- fileName: PtpConfigSlave.yaml 1 policyName: "config-policy" metadata: name: "du-ptp-slave" spec: profile: - name: "slave" interface: "ens5f1" 2 ptp4lOpts: "-2 -s --summary_interval -4" 3 phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16" 4 ptpClockThreshold: 5 holdOverTimeout: 30 # seconds maxOffsetThreshold: 100 # nano seconds minOffsetThreshold: -100
- 1
- Can be
PtpConfigMaster.yaml
orPtpConfigSlave.yaml
depending on your requirements. For configurations based ongroup-du-sno-ranGen.yaml
orgroup-du-3node-ranGen.yaml
, usePtpConfigSlave.yaml
. - 2
- Device specific interface name.
- 3
- You must append the
--summary_interval -4
value toptp4lOpts
in.spec.sourceFiles.spec.profile
to enable PTP fast events. - 4
- Required
phc2sysOpts
values.-m
prints messages tostdout
. Thelinuxptp-daemon
DaemonSet
parses the logs and generates Prometheus metrics. - 5
- Optional. If the
ptpClockThreshold
stanza is not present, default values are used for theptpClockThreshold
fields. The stanza shows defaultptpClockThreshold
values. TheptpClockThreshold
values configure how long after the PTP master clock is disconnected before PTP events are triggered.holdOverTimeout
is the time value in seconds before the PTP clock event state changes toFREERUN
when the PTP master clock is disconnected. ThemaxOffsetThreshold
andminOffsetThreshold
settings configure offset values in nanoseconds that compare against the values forCLOCK_REALTIME
(phc2sys
) or master offset (ptp4l
). When theptp4l
orphc2sys
offset value is outside this range, the PTP clock state is set toFREERUN
. When the offset value is within this range, the PTP clock state is set toLOCKED
.
- Merge any other required changes and files with your custom site repository.
- Push the changes to your site configuration repository to deploy PTP fast events to new sites using GitOps ZTP.
Additional resources
Additional resources
10.2.9. Configuring bare-metal events with PolicyGenTemplate CRs
You can use the GitOps ZTP pipeline to configure bare-metal events that use HTTP or AMQP transport.
HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.
10.2.9.1. Configuring bare-metal events that use HTTP transport
You can configure bare-metal events that use HTTP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data.
Procedure
Configure the Bare Metal Event Relay Operator by adding the following YAML to
spec.sourceFiles
in thecommon-ranGen.yaml
file:# Bare Metal Event Relay Operator - fileName: BareMetalEventRelaySubscriptionNS.yaml policyName: "subscriptions-policy" - fileName: BareMetalEventRelaySubscriptionOperGroup.yaml policyName: "subscriptions-policy" - fileName: BareMetalEventRelaySubscription.yaml policyName: "subscriptions-policy"
Add the
HardwareEvent
CR tospec.sourceFiles
in your specific group configuration file, for example, in thegroup-du-sno-ranGen.yaml
file:- fileName: HardwareEvent.yaml 1 policyName: "config-policy" spec: nodeSelector: {} transportHost: "http://hw-event-publisher-service.openshift-bare-metal-events.svc.cluster.local:9043" logLevel: "info"
- 1
- Each baseboard management controller (BMC) requires a single
HardwareEvent
CR only.
NoteIn OpenShift Container Platform 4.13 or later, you do not need to set the
transportHost
field in theHardwareEvent
custom resource (CR) when you use HTTP transport with bare-metal events.- Merge any other required changes and files with your custom site repository.
- Push the changes to your site configuration repository to deploy bare-metal events to new sites with GitOps ZTP.
Create the Redfish Secret by running the following command:
$ oc -n openshift-bare-metal-events create secret generic redfish-basic-auth \ --from-literal=username=<bmc_username> --from-literal=password=<bmc_password> \ --from-literal=hostaddr="<bmc_host_ip_addr>"
10.2.9.2. Configuring bare-metal events that use AMQP transport
You can configure bare-metal events that use AMQP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.
HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data.
Procedure
To configure the AMQ Interconnect Operator and the Bare Metal Event Relay Operator, add the following YAML to
spec.sourceFiles
in thecommon-ranGen.yaml
file:# AMQ Interconnect Operator for fast events - fileName: AmqSubscriptionNS.yaml policyName: "subscriptions-policy" - fileName: AmqSubscriptionOperGroup.yaml policyName: "subscriptions-policy" - fileName: AmqSubscription.yaml policyName: "subscriptions-policy" # Bare Metal Event Relay Operator - fileName: BareMetalEventRelaySubscriptionNS.yaml policyName: "subscriptions-policy" - fileName: BareMetalEventRelaySubscriptionOperGroup.yaml policyName: "subscriptions-policy" - fileName: BareMetalEventRelaySubscription.yaml policyName: "subscriptions-policy"
Add the
Interconnect
CR tospec.sourceFiles
in the site configuration file, for example, theexample-sno-site.yaml
file:- fileName: AmqInstance.yaml policyName: "config-policy"
Add the
HardwareEvent
CR tospec.sourceFiles
in your specific group configuration file, for example, in thegroup-du-sno-ranGen.yaml
file:- path: HardwareEvent.yaml patches: nodeSelector: {} transportHost: "amqp://<amq_interconnect_name>.<amq_interconnect_namespace>.svc.cluster.local" 1 logLevel: "info"
- 1
- The
transportHost
URL is composed of the existing AMQ Interconnect CRname
andnamespace
. For example, intransportHost: "amqp://amq-router.amq-router.svc.cluster.local"
, the AMQ Interconnectname
andnamespace
are both set toamq-router
.
NoteEach baseboard management controller (BMC) requires a single
HardwareEvent
resource only.-
Commit the
PolicyGenTemplate
change in Git, and then push the changes to your site configuration repository to deploy bare-metal events monitoring to new sites using GitOps ZTP. Create the Redfish Secret by running the following command:
$ oc -n openshift-bare-metal-events create secret generic redfish-basic-auth \ --from-literal=username=<bmc_username> --from-literal=password=<bmc_password> \ --from-literal=hostaddr="<bmc_host_ip_addr>"
10.2.10. Configuring the Image Registry Operator for local caching of images
OpenShift Container Platform manages image caching using a local registry. In edge computing use cases, clusters are often subject to bandwidth restrictions when communicating with centralized image registries, which might result in long image download times.
Long download times are unavoidable during initial deployment. Over time, there is a risk that CRI-O will erase the /var/lib/containers/storage
directory in the case of an unexpected shutdown. To address long image download times, you can create a local image registry on remote managed clusters using GitOps Zero Touch Provisioning (ZTP). This is useful in Edge computing scenarios where clusters are deployed at the far edge of the network.
Before you can set up the local image registry with GitOps ZTP, you need to configure disk partitioning in the SiteConfig
CR that you use to install the remote managed cluster. After installation, you configure the local image registry using a PolicyGenTemplate
CR. Then, the GitOps ZTP pipeline creates Persistent Volume (PV) and Persistent Volume Claim (PVC) CRs and patches the imageregistry
configuration.
The local image registry can only be used for user application images and cannot be used for the OpenShift Container Platform or Operator Lifecycle Manager operator images.
Additional resources
10.2.10.1. Configuring disk partitioning with SiteConfig
Configure disk partitioning for a managed cluster using a SiteConfig
CR and GitOps Zero Touch Provisioning (ZTP). The disk partition details in the SiteConfig
CR must match the underlying disk.
You must complete this procedure at installation time.
Prerequisites
- Install Butane.
Procedure
Create the
storage.bu
file.variant: fcos version: 1.3.0 storage: disks: - device: /dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0 1 wipe_table: false partitions: - label: var-lib-containers start_mib: <start_of_partition> 2 size_mib: <partition_size> 3 filesystems: - path: /var/lib/containers device: /dev/disk/by-partlabel/var-lib-containers format: xfs wipe_filesystem: true with_mount_unit: true mount_options: - defaults - prjquota
Convert the
storage.bu
to an Ignition file by running the following command:$ butane storage.bu
Example output
{"ignition":{"version":"3.2.0"},"storage":{"disks":[{"device":"/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0","partitions":[{"label":"var-lib-containers","sizeMiB":0,"startMiB":250000}],"wipeTable":false}],"filesystems":[{"device":"/dev/disk/by-partlabel/var-lib-containers","format":"xfs","mountOptions":["defaults","prjquota"],"path":"/var/lib/containers","wipeFilesystem":true}]},"systemd":{"units":[{"contents":"# # Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target","enabled":true,"name":"var-lib-containers.mount"}]}}
- Use a tool such as JSON Pretty Print to convert the output into JSON format.
Copy the output into the
.spec.clusters.nodes.ignitionConfigOverride
field in theSiteConfig
CR.Example
[...] spec: clusters: - nodes: - ignitionConfigOverride: | { "ignition": { "version": "3.2.0" }, "storage": { "disks": [ { "device": "/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0", "partitions": [ { "label": "var-lib-containers", "sizeMiB": 0, "startMiB": 250000 } ], "wipeTable": false } ], "filesystems": [ { "device": "/dev/disk/by-partlabel/var-lib-containers", "format": "xfs", "mountOptions": [ "defaults", "prjquota" ], "path": "/var/lib/containers", "wipeFilesystem": true } ] }, "systemd": { "units": [ { "contents": "# # Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target", "enabled": true, "name": "var-lib-containers.mount" } ] } } [...]
NoteIf the
.spec.clusters.nodes.ignitionConfigOverride
field does not exist, create it.
Verification
During or after installation, verify on the hub cluster that the
BareMetalHost
object shows the annotation by running the following command:$ oc get bmh -n my-sno-ns my-sno -ojson | jq '.metadata.annotations["bmac.agent-install.openshift.io/ignition-config-overrides"]
Example output
"{\"ignition\":{\"version\":\"3.2.0\"},\"storage\":{\"disks\":[{\"device\":\"/dev/disk/by-id/wwn-0x6b07b250ebb9d0002a33509f24af1f62\",\"partitions\":[{\"label\":\"var-lib-containers\",\"sizeMiB\":0,\"startMiB\":250000}],\"wipeTable\":false}],\"filesystems\":[{\"device\":\"/dev/disk/by-partlabel/var-lib-containers\",\"format\":\"xfs\",\"mountOptions\":[\"defaults\",\"prjquota\"],\"path\":\"/var/lib/containers\",\"wipeFilesystem\":true}]},\"systemd\":{\"units\":[{\"contents\":\"# Generated by Butane\\n[Unit]\\nRequires=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\nAfter=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\n\\n[Mount]\\nWhere=/var/lib/containers\\nWhat=/dev/disk/by-partlabel/var-lib-containers\\nType=xfs\\nOptions=defaults,prjquota\\n\\n[Install]\\nRequiredBy=local-fs.target\",\"enabled\":true,\"name\":\"var-lib-containers.mount\"}]}}"
After installation, check the single-node OpenShift disk status.
Enter into a debug session on the single-node OpenShift node by running the following command. This step instantiates a debug pod called
<node_name>-debug
:$ oc debug node/my-sno-node
Set
/host
as the root directory within the debug shell by running the following command. The debug pod mounts the host’s root file system in/host
within the pod. By changing the root directory to/host
, you can run binaries contained in the host’s executable paths:# chroot /host
List information about all available block devices by running the following command:
# lsblk
Example output
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 446.6G 0 disk ├─sda1 8:1 0 1M 0 part ├─sda2 8:2 0 127M 0 part ├─sda3 8:3 0 384M 0 part /boot ├─sda4 8:4 0 243.6G 0 part /var │ /sysroot/ostree/deploy/rhcos/var │ /usr │ /etc │ / │ /sysroot └─sda5 8:5 0 202.5G 0 part /var/lib/containers
Display information about the file system disk space usage by running the following command:
# df -h
Example output
Filesystem Size Used Avail Use% Mounted on devtmpfs 4.0M 0 4.0M 0% /dev tmpfs 126G 84K 126G 1% /dev/shm tmpfs 51G 93M 51G 1% /run /dev/sda4 244G 5.2G 239G 3% /sysroot tmpfs 126G 4.0K 126G 1% /tmp /dev/sda5 203G 119G 85G 59% /var/lib/containers /dev/sda3 350M 110M 218M 34% /boot tmpfs 26G 0 26G 0% /run/user/1000
10.2.10.2. Configuring the image registry using PolicyGenTemplate CRs
Use PolicyGenTemplate
(PGT) CRs to apply the CRs required to configure the image registry and patch the imageregistry
configuration.
Prerequisites
- You have configured a disk partition in the managed cluster.
-
You have installed the OpenShift CLI (
oc
). -
You have logged in to the hub cluster as a user with
cluster-admin
privileges. - You have created a Git repository where you manage your custom site configuration data for use with GitOps Zero Touch Provisioning (ZTP).
Procedure
Configure the storage class, persistent volume claim, persistent volume, and image registry configuration in the appropriate
PolicyGenTemplate
CR. For example, to configure an individual site, add the following YAML to the fileexample-sno-site.yaml
:sourceFiles: # storage class - fileName: StorageClass.yaml policyName: "sc-for-image-registry" metadata: name: image-registry-sc annotations: ran.openshift.io/ztp-deploy-wave: "100" 1 # persistent volume claim - fileName: StoragePVC.yaml policyName: "pvc-for-image-registry" metadata: name: image-registry-pvc namespace: openshift-image-registry annotations: ran.openshift.io/ztp-deploy-wave: "100" spec: accessModes: - ReadWriteMany resources: requests: storage: 100Gi storageClassName: image-registry-sc volumeMode: Filesystem # persistent volume - fileName: ImageRegistryPV.yaml 2 policyName: "pv-for-image-registry" metadata: annotations: ran.openshift.io/ztp-deploy-wave: "100" - fileName: ImageRegistryConfig.yaml policyName: "config-for-image-registry" complianceType: musthave metadata: annotations: ran.openshift.io/ztp-deploy-wave: "100" spec: storage: pvc: claim: "image-registry-pvc"
- 1
- Set the appropriate value for
ztp-deploy-wave
depending on whether you are configuring image registries at the site, common, or group level.ztp-deploy-wave: "100"
is suitable for development or testing because it allows you to group the referenced source files together. - 2
- In
ImageRegistryPV.yaml
, ensure that thespec.local.path
field is set to/var/imageregistry
to match the value set for themount_point
field in theSiteConfig
CR.
ImportantDo not set
complianceType: mustonlyhave
for the- fileName: ImageRegistryConfig.yaml
configuration. This can cause the registry pod deployment to fail.-
Commit the
PolicyGenTemplate
change in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.
Verification
Use the following steps to troubleshoot errors with the local image registry on the managed clusters:
Verify successful login to the registry while logged in to the managed cluster. Run the following commands:
Export the managed cluster name:
$ cluster=<managed_cluster_name>
Get the managed cluster
kubeconfig
details:$ oc get secret -n $cluster $cluster-admin-password -o jsonpath='{.data.password}' | base64 -d > kubeadmin-password-$cluster
Download and export the cluster
kubeconfig
:$ oc get secret -n $cluster $cluster-admin-kubeconfig -o jsonpath='{.data.kubeconfig}' | base64 -d > kubeconfig-$cluster && export KUBECONFIG=./kubeconfig-$cluster
- Verify access to the image registry from the managed cluster. See "Accessing the registry".
Check that the
Config
CRD in theimageregistry.operator.openshift.io
group instance is not reporting errors. Run the following command while logged in to the managed cluster:$ oc get image.config.openshift.io cluster -o yaml
Example output
apiVersion: config.openshift.io/v1 kind: Image metadata: annotations: include.release.openshift.io/ibm-cloud-managed: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" release.openshift.io/create-only: "true" creationTimestamp: "2021-10-08T19:02:39Z" generation: 5 name: cluster resourceVersion: "688678648" uid: 0406521b-39c0-4cda-ba75-873697da75a4 spec: additionalTrustedCA: name: acm-ice
Check that the
PersistentVolumeClaim
on the managed cluster is populated with data. Run the following command while logged in to the managed cluster:$ oc get pv image-registry-sc
Check that the
registry*
pod is running and is located under theopenshift-image-registry
namespace.$ oc get pods -n openshift-image-registry | grep registry*
Example output
cluster-image-registry-operator-68f5c9c589-42cfg 1/1 Running 0 8d image-registry-5f8987879-6nx6h 1/1 Running 0 8d
Check that the disk partition on the managed cluster is correct:
Open a debug shell to the managed cluster:
$ oc debug node/sno-1.example.com
Run
lsblk
to check the host disk partitions:sh-4.4# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 446.6G 0 disk |-sda1 8:1 0 1M 0 part |-sda2 8:2 0 127M 0 part |-sda3 8:3 0 384M 0 part /boot |-sda4 8:4 0 336.3G 0 part /sysroot `-sda5 8:5 0 100.1G 0 part /var/imageregistry 1 sdb 8:16 0 446.6G 0 disk sr0 11:0 1 104M 0 rom
- 1
/var/imageregistry
indicates that the disk is correctly partitioned.
Additional resources
10.3. Updating managed clusters in a disconnected environment with PolicyGenTemplate resources and TALM
You can use the Topology Aware Lifecycle Manager (TALM) to manage the software lifecycle of managed clusters that you have deployed by using GitOps Zero Touch Provisioning (ZTP) and Topology Aware Lifecycle Manager (TALM). TALM uses Red Hat Advanced Cluster Management (RHACM) PolicyGenTemplate policies to manage and control changes applied to target clusters.
Using PolicyGenTemplate
CRs to manage and deploy polices to managed clusters will be deprecated in an upcoming OpenShift Container Platform release. Equivalent and improved functionality is available using Red Hat Advanced Cluster Management (RHACM) and PolicyGenerator
CRs.
For more information about PolicyGenerator
resources, see the RHACM Policy Generator documentation.
Additional resources
10.3.1. Setting up the disconnected environment
TALM can perform both platform and Operator updates.
You must mirror both the platform image and Operator images that you want to update to in your mirror registry before you can use TALM to update your disconnected clusters. Complete the following steps to mirror the images:
For platform updates, you must perform the following steps:
Mirror the desired OpenShift Container Platform image repository. Ensure that the desired platform image is mirrored by following the "Mirroring the OpenShift Container Platform image repository" procedure linked in the Additional resources. Save the contents of the
imageContentSources
section in theimageContentSources.yaml
file:Example output
imageContentSources: - mirrors: - mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4 source: quay.io/openshift-release-dev/ocp-release - mirrors: - mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4 source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
Save the image signature of the desired platform image that was mirrored. You must add the image signature to the
PolicyGenTemplate
CR for platform updates. To get the image signature, perform the following steps:Specify the desired OpenShift Container Platform tag by running the following command:
$ OCP_RELEASE_NUMBER=<release_version>
Specify the architecture of the cluster by running the following command:
$ ARCHITECTURE=<cluster_architecture> 1
- 1
- Specify the architecture of the cluster, such as
x86_64
,aarch64
,s390x
, orppc64le
.
Get the release image digest from Quay by running the following command
$ DIGEST="$(oc adm release info quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE_NUMBER}-${ARCHITECTURE} | sed -n 's/Pull From: .*@//p')"
Set the digest algorithm by running the following command:
$ DIGEST_ALGO="${DIGEST%%:*}"
Set the digest signature by running the following command:
$ DIGEST_ENCODED="${DIGEST#*:}"
Get the image signature from the mirror.openshift.com website by running the following command:
$ SIGNATURE_BASE64=$(curl -s "https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/${DIGEST_ALGO}=${DIGEST_ENCODED}/signature-1" | base64 -w0 && echo)
Save the image signature to the
checksum-<OCP_RELEASE_NUMBER>.yaml
file by running the following commands:$ cat >checksum-${OCP_RELEASE_NUMBER}.yaml <<EOF ${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64} EOF
Prepare the update graph. You have two options to prepare the update graph:
Use the OpenShift Update Service.
For more information about how to set up the graph on the hub cluster, see Deploy the operator for OpenShift Update Service and Build the graph data init container.
Make a local copy of the upstream graph. Host the update graph on an
http
orhttps
server in the disconnected environment that has access to the managed cluster. To download the update graph, use the following command:$ curl -s https://api.openshift.com/api/upgrades_info/v1/graph?channel=stable-4.16 -o ~/upgrade-graph_stable-4.16
For Operator updates, you must perform the following task:
- Mirror the Operator catalogs. Ensure that the desired operator images are mirrored by following the procedure in the "Mirroring Operator catalogs for use with disconnected clusters" section.
10.3.2. Performing a platform update with PolicyGenTemplate CRs
You can perform a platform update with the TALM.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Update GitOps Zero Touch Provisioning (ZTP) to the latest version.
- Provision one or more managed clusters with GitOps ZTP.
- Mirror the desired image repository.
-
Log in as a user with
cluster-admin
privileges. - Create RHACM policies in the hub cluster.
Procedure
Create a
PolicyGenTemplate
CR for the platform update:Save the following
PolicyGenTemplate
CR in thedu-upgrade.yaml
file:Example of
PolicyGenTemplate
for platform updateapiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "du-upgrade" namespace: "ztp-group-du-sno" spec: bindingRules: group-du-sno: "" mcp: "master" remediationAction: inform sourceFiles: - fileName: ImageSignature.yaml 1 policyName: "platform-upgrade-prep" binaryData: ${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64} 2 - fileName: DisconnectedICSP.yaml policyName: "platform-upgrade-prep" metadata: name: disconnected-internal-icsp-for-ocp spec: repositoryDigestMirrors: 3 - mirrors: - quay-intern.example.com/ocp4/openshift-release-dev source: quay.io/openshift-release-dev/ocp-release - mirrors: - quay-intern.example.com/ocp4/openshift-release-dev source: quay.io/openshift-release-dev/ocp-v4.0-art-dev - fileName: ClusterVersion.yaml 4 policyName: "platform-upgrade" metadata: name: version spec: channel: "stable-4.16" upstream: http://upgrade.example.com/images/upgrade-graph_stable-4.16 desiredUpdate: version: 4.16.4 status: history: - version: 4.16.4 state: "Completed"
- 1
- The
ConfigMap
CR contains the signature of the desired release image to update to. - 2
- Shows the image signature of the desired OpenShift Container Platform release. Get the signature from the
checksum-${OCP_RELEASE_NUMBER}.yaml
file you saved when following the procedures in the "Setting up the environment" section. - 3
- Shows the mirror repository that contains the desired OpenShift Container Platform image. Get the mirrors from the
imageContentSources.yaml
file that you saved when following the procedures in the "Setting up the environment" section. - 4
- Shows the
ClusterVersion
CR to trigger the update. Thechannel
,upstream
, anddesiredVersion
fields are all required for image pre-caching.
The
PolicyGenTemplate
CR generates two policies:-
The
du-upgrade-platform-upgrade-prep
policy does the preparation work for the platform update. It creates theConfigMap
CR for the desired release image signature, creates the image content source of the mirrored release image repository, and updates the cluster version with the desired update channel and the update graph reachable by the managed cluster in the disconnected environment. -
The
du-upgrade-platform-upgrade
policy is used to perform platform upgrade.
Add the
du-upgrade.yaml
file contents to thekustomization.yaml
file located in the GitOps ZTP Git repository for thePolicyGenTemplate
CRs and push the changes to the Git repository.ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.
Check the created policies by running the following command:
$ oc get policies -A | grep platform-upgrade
Create the
ClusterGroupUpdate
CR for the platform update with thespec.enable
field set tofalse
.Save the content of the platform update
ClusterGroupUpdate
CR with thedu-upgrade-platform-upgrade-prep
and thedu-upgrade-platform-upgrade
policies and the target clusters to thecgu-platform-upgrade.yml
file, as shown in the following example:apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-platform-upgrade namespace: default spec: managedPolicies: - du-upgrade-platform-upgrade-prep - du-upgrade-platform-upgrade preCaching: false clusters: - spoke1 remediationStrategy: maxConcurrency: 1 enable: false
Apply the
ClusterGroupUpdate
CR to the hub cluster by running the following command:$ oc apply -f cgu-platform-upgrade.yml
Optional: Pre-cache the images for the platform update.
Enable pre-caching in the
ClusterGroupUpdate
CR by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \ --patch '{"spec":{"preCaching": true}}' --type=merge
Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the hub cluster:
$ oc get cgu cgu-platform-upgrade -o jsonpath='{.status.precaching.status}'
Start the platform update:
Enable the
cgu-platform-upgrade
policy and disable pre-caching by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
$ oc get policies --all-namespaces
Additional resources
10.3.3. Performing an Operator update with PolicyGenTemplate CRs
You can perform an Operator update with the TALM.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Update GitOps Zero Touch Provisioning (ZTP) to the latest version.
- Provision one or more managed clusters with GitOps ZTP.
- Mirror the desired index image, bundle images, and all Operator images referenced in the bundle images.
-
Log in as a user with
cluster-admin
privileges. - Create RHACM policies in the hub cluster.
Procedure
Update the
PolicyGenTemplate
CR for the Operator update.Update the
du-upgrade
PolicyGenTemplate
CR with the following additional contents in thedu-upgrade.yaml
file:apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "du-upgrade" namespace: "ztp-group-du-sno" spec: bindingRules: group-du-sno: "" mcp: "master" remediationAction: inform sourceFiles: - fileName: DefaultCatsrc.yaml remediationAction: inform policyName: "operator-catsrc-policy" metadata: name: redhat-operators-disconnected spec: displayName: Red Hat Operators Catalog image: registry.example.com:5000/olm/redhat-operators-disconnected:v4.16 1 updateStrategy: 2 registryPoll: interval: 1h status: connectionState: lastObservedState: READY 3
- 1
- The index image URL contains the desired Operator images. If the index images are always pushed to the same image name and tag, this change is not needed.
- 2
- Set how frequently the Operator Lifecycle Manager (OLM) polls the index image for new Operator versions with the
registryPoll.interval
field. This change is not needed if a new index image tag is always pushed for y-stream and z-stream Operator updates. TheregistryPoll.interval
field can be set to a shorter interval to expedite the update, however shorter intervals increase computational load. To counteract this behavior, you can restoreregistryPoll.interval
to the default value once the update is complete. - 3
- Last observed state of the catalog connection. The
READY
value ensures that theCatalogSource
policy is ready, indicating that the index pod is pulled and is running. This way, TALM upgrades the Operators based on up-to-date policy compliance states.
This update generates one policy,
du-upgrade-operator-catsrc-policy
, to update theredhat-operators-disconnected
catalog source with the new index images that contain the desired Operators images.NoteIf you want to use the image pre-caching for Operators and there are Operators from a different catalog source other than
redhat-operators-disconnected
, you must perform the following tasks:- Prepare a separate catalog source policy with the new index image or registry poll interval update for the different catalog source.
- Prepare a separate subscription policy for the desired Operators that are from the different catalog source.
For example, the desired SRIOV-FEC Operator is available in the
certified-operators
catalog source. To update the catalog source and the Operator subscription, add the following contents to generate two policies,du-upgrade-fec-catsrc-policy
anddu-upgrade-subscriptions-fec-policy
:apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "du-upgrade" namespace: "ztp-group-du-sno" spec: bindingRules: group-du-sno: "" mcp: "master" remediationAction: inform sourceFiles: # ... - fileName: DefaultCatsrc.yaml remediationAction: inform policyName: "fec-catsrc-policy" metadata: name: certified-operators spec: displayName: Intel SRIOV-FEC Operator image: registry.example.com:5000/olm/far-edge-sriov-fec:v4.10 updateStrategy: registryPoll: interval: 10m - fileName: AcceleratorsSubscription.yaml policyName: "subscriptions-fec-policy" spec: channel: "stable" source: certified-operators
Remove the specified subscriptions channels in the common
PolicyGenTemplate
CR, if they exist. The default subscriptions channels from the GitOps ZTP image are used for the update.NoteThe default channel for the Operators applied through GitOps ZTP 4.16 is
stable
, except for theperformance-addon-operator
. As of OpenShift Container Platform 4.11, theperformance-addon-operator
functionality was moved to thenode-tuning-operator
. For the 4.10 release, the default channel for PAO isv4.10
. You can also specify the default channels in the commonPolicyGenTemplate
CR.Push the
PolicyGenTemplate
CRs updates to the GitOps ZTP Git repository.ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.
Check the created policies by running the following command:
$ oc get policies -A | grep -E "catsrc-policy|subscription"
Apply the required catalog source updates before starting the Operator update.
Save the content of the
ClusterGroupUpgrade
CR namedoperator-upgrade-prep
with the catalog source policies and the target managed clusters to thecgu-operator-upgrade-prep.yml
file:apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-operator-upgrade-prep namespace: default spec: clusters: - spoke1 enable: true managedPolicies: - du-upgrade-operator-catsrc-policy remediationStrategy: maxConcurrency: 1
Apply the policy to the hub cluster by running the following command:
$ oc apply -f cgu-operator-upgrade-prep.yml
Monitor the update process. Upon completion, ensure that the policy is compliant by running the following command:
$ oc get policies -A | grep -E "catsrc-policy"
Create the
ClusterGroupUpgrade
CR for the Operator update with thespec.enable
field set tofalse
.Save the content of the Operator update
ClusterGroupUpgrade
CR with thedu-upgrade-operator-catsrc-policy
policy and the subscription policies created from the commonPolicyGenTemplate
and the target clusters to thecgu-operator-upgrade.yml
file, as shown in the following example:apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-operator-upgrade namespace: default spec: managedPolicies: - du-upgrade-operator-catsrc-policy 1 - common-subscriptions-policy 2 preCaching: false clusters: - spoke1 remediationStrategy: maxConcurrency: 1 enable: false
- 1
- The policy is needed by the image pre-caching feature to retrieve the operator images from the catalog source.
- 2
- The policy contains Operator subscriptions. If you have followed the structure and content of the reference
PolicyGenTemplates
, all Operator subscriptions are grouped into thecommon-subscriptions-policy
policy.
NoteOne
ClusterGroupUpgrade
CR can only pre-cache the images of the desired Operators defined in the subscription policy from one catalog source included in theClusterGroupUpgrade
CR. If the desired Operators are from different catalog sources, such as in the example of the SRIOV-FEC Operator, anotherClusterGroupUpgrade
CR must be created withdu-upgrade-fec-catsrc-policy
anddu-upgrade-subscriptions-fec-policy
policies for the SRIOV-FEC Operator images pre-caching and update.Apply the
ClusterGroupUpgrade
CR to the hub cluster by running the following command:$ oc apply -f cgu-operator-upgrade.yml
Optional: Pre-cache the images for the Operator update.
Before starting image pre-caching, verify the subscription policy is
NonCompliant
at this point by running the following command:$ oc get policy common-subscriptions-policy -n <policy_namespace>
Example output
NAME REMEDIATION ACTION COMPLIANCE STATE AGE common-subscriptions-policy inform NonCompliant 27d
Enable pre-caching in the
ClusterGroupUpgrade
CR by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \ --patch '{"spec":{"preCaching": true}}' --type=merge
Monitor the process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the managed cluster:
$ oc get cgu cgu-operator-upgrade -o jsonpath='{.status.precaching.status}'
Check if the pre-caching is completed before starting the update by running the following command:
$ oc get cgu -n default cgu-operator-upgrade -ojsonpath='{.status.conditions}' | jq
Example output
[ { "lastTransitionTime": "2022-03-08T20:49:08.000Z", "message": "The ClusterGroupUpgrade CR is not enabled", "reason": "UpgradeNotStarted", "status": "False", "type": "Ready" }, { "lastTransitionTime": "2022-03-08T20:55:30.000Z", "message": "Precaching is completed", "reason": "PrecachingCompleted", "status": "True", "type": "PrecachingDone" } ]
Start the Operator update.
Enable the
cgu-operator-upgrade
ClusterGroupUpgrade
CR and disable pre-caching to start the Operator update by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
$ oc get policies --all-namespaces
Additional resources
10.3.4. Troubleshooting missed Operator updates with PolicyGenTemplate CRs
In some scenarios, Topology Aware Lifecycle Manager (TALM) might miss Operator updates due to an out-of-date policy compliance state.
After a catalog source update, it takes time for the Operator Lifecycle Manager (OLM) to update the subscription status. The status of the subscription policy might continue to show as compliant while TALM decides whether remediation is needed. As a result, the Operator specified in the subscription policy does not get upgraded.
To avoid this scenario, add another catalog source configuration to the PolicyGenTemplate
and specify this configuration in the subscription for any Operators that require an update.
Procedure
Add a catalog source configuration in the
PolicyGenTemplate
resource:- fileName: DefaultCatsrc.yaml remediationAction: inform policyName: "operator-catsrc-policy" metadata: name: redhat-operators-disconnected spec: displayName: Red Hat Operators Catalog image: registry.example.com:5000/olm/redhat-operators-disconnected:v{product-version} updateStrategy: registryPoll: interval: 1h status: connectionState: lastObservedState: READY - fileName: DefaultCatsrc.yaml remediationAction: inform policyName: "operator-catsrc-policy" metadata: name: redhat-operators-disconnected-v2 1 spec: displayName: Red Hat Operators Catalog v2 2 image: registry.example.com:5000/olm/redhat-operators-disconnected:<version> 3 updateStrategy: registryPoll: interval: 1h status: connectionState: lastObservedState: READY
Update the
Subscription
resource to point to the new configuration for Operators that require an update:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: operator-subscription namespace: operator-namspace # ... spec: source: redhat-operators-disconnected-v2 1 # ...
- 1
- Enter the name of the additional catalog source configuration that you defined in the
PolicyGenTemplate
resource.
10.3.5. Performing a platform and an Operator update together
You can perform a platform and an Operator update at the same time.
Prerequisites
- Install the Topology Aware Lifecycle Manager (TALM).
- Update GitOps Zero Touch Provisioning (ZTP) to the latest version.
- Provision one or more managed clusters with GitOps ZTP.
-
Log in as a user with
cluster-admin
privileges. - Create RHACM policies in the hub cluster.
Procedure
-
Create the
PolicyGenTemplate
CR for the updates by following the steps described in the "Performing a platform update" and "Performing an Operator update" sections. Apply the prep work for the platform and the Operator update.
Save the content of the
ClusterGroupUpgrade
CR with the policies for platform update preparation work, catalog source updates, and target clusters to thecgu-platform-operator-upgrade-prep.yml
file, for example:apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-platform-operator-upgrade-prep namespace: default spec: managedPolicies: - du-upgrade-platform-upgrade-prep - du-upgrade-operator-catsrc-policy clusterSelector: - group-du-sno remediationStrategy: maxConcurrency: 10 enable: true
Apply the
cgu-platform-operator-upgrade-prep.yml
file to the hub cluster by running the following command:$ oc apply -f cgu-platform-operator-upgrade-prep.yml
Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
$ oc get policies --all-namespaces
Create the
ClusterGroupUpdate
CR for the platform and the Operator update with thespec.enable
field set tofalse
.Save the contents of the platform and Operator update
ClusterGroupUpdate
CR with the policies and the target clusters to thecgu-platform-operator-upgrade.yml
file, as shown in the following example:apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-du-upgrade namespace: default spec: managedPolicies: - du-upgrade-platform-upgrade 1 - du-upgrade-operator-catsrc-policy 2 - common-subscriptions-policy 3 preCaching: true clusterSelector: - group-du-sno remediationStrategy: maxConcurrency: 1 enable: false
Apply the
cgu-platform-operator-upgrade.yml
file to the hub cluster by running the following command:$ oc apply -f cgu-platform-operator-upgrade.yml
Optional: Pre-cache the images for the platform and the Operator update.
Enable pre-caching in the
ClusterGroupUpgrade
CR by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \ --patch '{"spec":{"preCaching": true}}' --type=merge
Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the managed cluster:
$ oc get jobs,pods -n openshift-talm-pre-cache
Check if the pre-caching is completed before starting the update by running the following command:
$ oc get cgu cgu-du-upgrade -ojsonpath='{.status.conditions}'
Start the platform and Operator update.
Enable the
cgu-du-upgrade
ClusterGroupUpgrade
CR to start the platform and the Operator update by running the following command:$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
$ oc get policies --all-namespaces
NoteThe CRs for the platform and Operator updates can be created from the beginning by configuring the setting to
spec.enable: true
. In this case, the update starts immediately after pre-caching completes and there is no need to manually enable the CR.Both pre-caching and the update create extra resources, such as policies, placement bindings, placement rules, managed cluster actions, and managed cluster view, to help complete the procedures. Setting the
afterCompletion.deleteObjects
field totrue
deletes all these resources after the updates complete.
10.3.6. Removing Performance Addon Operator subscriptions from deployed clusters with PolicyGenTemplate CRs
In earlier versions of OpenShift Container Platform, the Performance Addon Operator provided automatic, low latency performance tuning for applications. In OpenShift Container Platform 4.11 or later, these functions are part of the Node Tuning Operator.
Do not install the Performance Addon Operator on clusters running OpenShift Container Platform 4.11 or later. If you upgrade to OpenShift Container Platform 4.11 or later, the Node Tuning Operator automatically removes the Performance Addon Operator.
You need to remove any policies that create Performance Addon Operator subscriptions to prevent a re-installation of the Operator.
The reference DU profile includes the Performance Addon Operator in the PolicyGenTemplate
CR common-ranGen.yaml
. To remove the subscription from deployed managed clusters, you must update common-ranGen.yaml
.
If you install Performance Addon Operator 4.10.3-5 or later on OpenShift Container Platform 4.11 or later, the Performance Addon Operator detects the cluster version and automatically hibernates to avoid interfering with the Node Tuning Operator functions. However, to ensure best performance, remove the Performance Addon Operator from your OpenShift Container Platform 4.11 clusters.
Prerequisites
- Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for ArgoCD.
- Update to OpenShift Container Platform 4.11 or later.
-
Log in as a user with
cluster-admin
privileges.
Procedure
Change the
complianceType
tomustnothave
for the Performance Addon Operator namespace, Operator group, and subscription in thecommon-ranGen.yaml
file.- fileName: PaoSubscriptionNS.yaml policyName: "subscriptions-policy" complianceType: mustnothave - fileName: PaoSubscriptionOperGroup.yaml policyName: "subscriptions-policy" complianceType: mustnothave - fileName: PaoSubscription.yaml policyName: "subscriptions-policy" complianceType: mustnothave
-
Merge the changes with your custom site repository and wait for the ArgoCD application to synchronize the change to the hub cluster. The status of the
common-subscriptions-policy
policy changes toNon-Compliant
. - Apply the change to your target clusters by using the Topology Aware Lifecycle Manager. For more information about rolling out configuration changes, see the "Additional resources" section.
Monitor the process. When the status of the
common-subscriptions-policy
policy for a target cluster isCompliant
, the Performance Addon Operator has been removed from the cluster. Get the status of thecommon-subscriptions-policy
by running the following command:$ oc get policy -n ztp-common common-subscriptions-policy
-
Delete the Performance Addon Operator namespace, Operator group and subscription CRs from
spec.sourceFiles
in thecommon-ranGen.yaml
file. - Merge the changes with your custom site repository and wait for the ArgoCD application to synchronize the change to the hub cluster. The policy remains compliant.
10.3.7. Pre-caching user-specified images with TALM on single-node OpenShift clusters
You can pre-cache application-specific workload images on single-node OpenShift clusters before upgrading your applications.
You can specify the configuration options for the pre-caching jobs using the following custom resources (CR):
-
PreCachingConfig
CR -
ClusterGroupUpgrade
CR
All fields in the PreCachingConfig
CR are optional.
Example PreCachingConfig CR
apiVersion: ran.openshift.io/v1alpha1 kind: PreCachingConfig metadata: name: exampleconfig namespace: exampleconfig-ns spec: overrides: 1 platformImage: quay.io/openshift-release-dev/ocp-release@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e1ef operatorsIndexes: - registry.example.com:5000/custom-redhat-operators:1.0.0 operatorsPackagesAndChannels: - local-storage-operator: stable - ptp-operator: stable - sriov-network-operator: stable spaceRequired: 30 Gi 2 excludePrecachePatterns: 3 - aws - vsphere additionalImages: 4 - quay.io/exampleconfig/application1@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e1ef - quay.io/exampleconfig/application2@sha256:3d5800123dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47adfaef - quay.io/exampleconfig/applicationN@sha256:4fe1334adfafadsf987123adfffdaf1243340adfafdedga0991234afdadfsa09
- 1
- By default, TALM automatically populates the
platformImage
,operatorsIndexes
, and theoperatorsPackagesAndChannels
fields from the policies of the managed clusters. You can specify values to override the default TALM-derived values for these fields. - 2
- Specifies the minimum required disk space on the cluster. If unspecified, TALM defines a default value for OpenShift Container Platform images. The disk space field must include an integer value and the storage unit. For example:
40 GiB
,200 MB
,1 TiB
. - 3
- Specifies the images to exclude from pre-caching based on image name matching.
- 4
- Specifies the list of additional images to pre-cache.
Example ClusterGroupUpgrade CR with PreCachingConfig CR reference
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu spec: preCaching: true 1 preCachingConfigRef: name: exampleconfig 2 namespace: exampleconfig-ns 3
10.3.7.1. Creating the custom resources for pre-caching
You must create the PreCachingConfig
CR before or concurrently with the ClusterGroupUpgrade
CR.
Create the
PreCachingConfig
CR with the list of additional images you want to pre-cache.apiVersion: ran.openshift.io/v1alpha1 kind: PreCachingConfig metadata: name: exampleconfig namespace: default 1 spec: [...] spaceRequired: 30Gi 2 additionalImages: - quay.io/exampleconfig/application1@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e1ef - quay.io/exampleconfig/application2@sha256:3d5800123dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47adfaef - quay.io/exampleconfig/applicationN@sha256:4fe1334adfafadsf987123adfffdaf1243340adfafdedga0991234afdadfsa09
Create a
ClusterGroupUpgrade
CR with thepreCaching
field set totrue
and specify thePreCachingConfig
CR created in the previous step:apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu namespace: default spec: clusters: - sno1 - sno2 preCaching: true preCachingConfigRef: - name: exampleconfig namespace: default managedPolicies: - du-upgrade-platform-upgrade - du-upgrade-operator-catsrc-policy - common-subscriptions-policy remediationStrategy: timeout: 240
WarningOnce you install the images on the cluster, you cannot change or delete them.
When you want to start pre-caching the images, apply the
ClusterGroupUpgrade
CR by running the following command:$ oc apply -f cgu.yaml
TALM verifies the ClusterGroupUpgrade
CR.
From this point, you can continue with the TALM pre-caching workflow.
All sites are pre-cached concurrently.
Verification
Check the pre-caching status on the hub cluster where the
ClusterUpgradeGroup
CR is applied by running the following command:$ oc get cgu <cgu_name> -n <cgu_namespace> -oyaml
Example output
precaching: spec: platformImage: quay.io/openshift-release-dev/ocp-release@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e1ef operatorsIndexes: - registry.example.com:5000/custom-redhat-operators:1.0.0 operatorsPackagesAndChannels: - local-storage-operator: stable - ptp-operator: stable - sriov-network-operator: stable excludePrecachePatterns: - aws - vsphere additionalImages: - quay.io/exampleconfig/application1@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e1ef - quay.io/exampleconfig/application2@sha256:3d5800123dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47adfaef - quay.io/exampleconfig/applicationN@sha256:4fe1334adfafadsf987123adfffdaf1243340adfafdedga0991234afdadfsa09 spaceRequired: "30" status: sno1: Starting sno2: Starting
The pre-caching configurations are validated by checking if the managed policies exist. Valid configurations of the
ClusterGroupUpgrade
and thePreCachingConfig
CRs result in the following statuses:Example output of valid CRs
- lastTransitionTime: "2023-01-01T00:00:01Z" message: All selected clusters are valid reason: ClusterSelectionCompleted status: "True" type: ClusterSelected - lastTransitionTime: "2023-01-01T00:00:02Z" message: Completed validation reason: ValidationCompleted status: "True" type: Validated - lastTransitionTime: "2023-01-01T00:00:03Z" message: Precaching spec is valid and consistent reason: PrecacheSpecIsWellFormed status: "True" type: PrecacheSpecValid - lastTransitionTime: "2023-01-01T00:00:04Z" message: Precaching in progress for 1 clusters reason: InProgress status: "False" type: PrecachingSucceeded
Example of an invalid PreCachingConfig CR
Type: "PrecacheSpecValid" Status: False, Reason: "PrecacheSpecIncomplete" Message: "Precaching spec is incomplete: failed to get PreCachingConfig resource due to PreCachingConfig.ran.openshift.io "<pre-caching_cr_name>" not found"
You can find the pre-caching job by running the following command on the managed cluster:
$ oc get jobs -n openshift-talo-pre-cache
Example of pre-caching job in progress
NAME COMPLETIONS DURATION AGE pre-cache 0/1 1s 1s
You can check the status of the pod created for the pre-caching job by running the following command:
$ oc describe pod pre-cache -n openshift-talo-pre-cache
Example of pre-caching job in progress
Type Reason Age From Message Normal SuccesfulCreate 19s job-controller Created pod: pre-cache-abcd1
You can get live updates on the status of the job by running the following command:
$ oc logs -f pre-cache-abcd1 -n openshift-talo-pre-cache
To verify the pre-cache job is successfully completed, run the following command:
$ oc describe pod pre-cache -n openshift-talo-pre-cache
Example of completed pre-cache job
Type Reason Age From Message Normal SuccesfulCreate 5m19s job-controller Created pod: pre-cache-abcd1 Normal Completed 19s job-controller Job completed
To verify that the images are successfully pre-cached on the single-node OpenShift, do the following:
Enter into the node in debug mode:
$ oc debug node/cnfdf00.example.lab
Change root to
host
:$ chroot /host/
Search for the desired images:
$ sudo podman images | grep <operator_name>
Additional resources
10.3.8. About the auto-created ClusterGroupUpgrade CR for GitOps ZTP
TALM has a controller called ManagedClusterForCGU
that monitors the Ready
state of the ManagedCluster
CRs on the hub cluster and creates the ClusterGroupUpgrade
CRs for GitOps Zero Touch Provisioning (ZTP).
For any managed cluster in the Ready
state without a ztp-done
label applied, the ManagedClusterForCGU
controller automatically creates a ClusterGroupUpgrade
CR in the ztp-install
namespace with its associated RHACM policies that are created during the GitOps ZTP process. TALM then remediates the set of configuration policies that are listed in the auto-created ClusterGroupUpgrade
CR to push the configuration CRs to the managed cluster.
If there are no policies for the managed cluster at the time when the cluster becomes Ready
, a ClusterGroupUpgrade
CR with no policies is created. Upon completion of the ClusterGroupUpgrade
the managed cluster is labeled as ztp-done
. If there are policies that you want to apply for that managed cluster, manually create a ClusterGroupUpgrade
as a day-2 operation.
Example of an auto-created ClusterGroupUpgrade
CR for GitOps ZTP
apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: generation: 1 name: spoke1 namespace: ztp-install ownerReferences: - apiVersion: cluster.open-cluster-management.io/v1 blockOwnerDeletion: true controller: true kind: ManagedCluster name: spoke1 uid: 98fdb9b2-51ee-4ee7-8f57-a84f7f35b9d5 resourceVersion: "46666836" uid: b8be9cd2-764f-4a62-87d6-6b767852c7da spec: actions: afterCompletion: addClusterLabels: ztp-done: "" 1 deleteClusterLabels: ztp-running: "" deleteObjects: true beforeEnable: addClusterLabels: ztp-running: "" 2 clusters: - spoke1 enable: true managedPolicies: - common-spoke1-config-policy - common-spoke1-subscriptions-policy - group-spoke1-config-policy - spoke1-config-policy - group-spoke1-validator-du-policy preCaching: false remediationStrategy: maxConcurrency: 1 timeout: 240