Home
Products
OpenShift Container Platform
4.17
Edge computing
Chapter 10. Managing cluster policies with PolicyGenTemplate resources

Chapter 10. Managing cluster policies with PolicyGenTemplate resources

10.1. Configuring managed cluster policies by using PolicyGenTemplate resources
Copy link

Applied Policy custom resources (CRs) configure the managed clusters that you provision. You can customize how Red Hat Advanced Cluster Management (RHACM) uses PolicyGenTemplate CRs to generate the applied Policy CRs.

Important

Using PolicyGenTemplate CRs to manage and deploy policies to managed clusters will be deprecated in an upcoming OpenShift Container Platform release. Equivalent and improved functionality is available using Red Hat Advanced Cluster Management (RHACM) and PolicyGenerator CRs.

For more information about PolicyGenerator resources, see the RHACM Policy Generator documentation.

10.1.1. About the PolicyGenTemplate CRD
Copy link

The PolicyGenTemplate custom resource definition (CRD) tells the PolicyGen policy generator what custom resources (CRs) to include in the cluster configuration, how to combine the CRs into the generated policies, and what items in those CRs need to be updated with overlay content.

The following example shows a PolicyGenTemplate CR (common-du-ranGen.yaml) extracted from the ztp-site-generate reference container. The common-du-ranGen.yaml file defines two Red Hat Advanced Cluster Management (RHACM) policies. The policies manage a collection of configuration CRs, one for each unique value of policyName in the CR. common-du-ranGen.yaml creates a single placement binding and a placement rule to bind the policies to clusters based on the labels listed in the spec.bindingRules section.

Example PolicyGenTemplate CR - common-ranGen.yaml

apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
  name: "common-latest"
  namespace: "ztp-common"
spec:
  bindingRules:
    common: "true"


    du-profile: "latest"
  sourceFiles:


    - fileName: SriovSubscriptionNS.yaml
      policyName: "subscriptions-policy"
    - fileName: SriovSubscriptionOperGroup.yaml
      policyName: "subscriptions-policy"
    - fileName: SriovSubscription.yaml
      policyName: "subscriptions-policy"
    - fileName: SriovOperatorStatus.yaml
      policyName: "subscriptions-policy"
    - fileName: PtpSubscriptionNS.yaml
      policyName: "subscriptions-policy"
    - fileName: PtpSubscriptionOperGroup.yaml
      policyName: "subscriptions-policy"
    - fileName: PtpSubscription.yaml
      policyName: "subscriptions-policy"
    - fileName: PtpOperatorStatus.yaml
      policyName: "subscriptions-policy"
    - fileName: ClusterLogNS.yaml
      policyName: "subscriptions-policy"
    - fileName: ClusterLogOperGroup.yaml
      policyName: "subscriptions-policy"
    - fileName: ClusterLogSubscription.yaml
      policyName: "subscriptions-policy"
    - fileName: ClusterLogOperatorStatus.yaml
      policyName: "subscriptions-policy"
    - fileName: StorageNS.yaml
      policyName: "subscriptions-policy"
    - fileName: StorageOperGroup.yaml
      policyName: "subscriptions-policy"
    - fileName: StorageSubscription.yaml
      policyName: "subscriptions-policy"
    - fileName: StorageOperatorStatus.yaml
      policyName: "subscriptions-policy"
    - fileName: DefaultCatsrc.yaml


      policyName: "config-policy"


      metadata:
        name: redhat-operators-disconnected
      spec:
        displayName: disconnected-redhat-operators
        image: registry.example.com:5000/disconnected-redhat-operators/disconnected-redhat-operator-index:v4.9
    - fileName: DisconnectedICSP.yaml
      policyName: "config-policy"
      spec:
        repositoryDigestMirrors:
        - mirrors:
          - registry.example.com:5000
          source: registry.redhat.io

1: common: "true" applies the policies to all clusters with this label.
2: Files listed under sourceFiles create the Operator policies for installed clusters.
3: DefaultCatsrc.yaml configures the catalog source for the disconnected registry.
4: policyName: "config-policy" configures Operator subscriptions. The OperatorHub CR disables the default and this CR replaces redhat-operators with a CatalogSource CR that points to the disconnected registry.

A PolicyGenTemplate CR can be constructed with any number of included CRs. Apply the following example CR in the hub cluster to generate a policy containing a single CR:

apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
  name: "group-du-sno"
  namespace: "ztp-group"
spec:
  bindingRules:
    group-du-sno: ""
  mcp: "master"
  sourceFiles:
    - fileName: PtpConfigSlave.yaml
      policyName: "config-policy"
      metadata:
        name: "du-ptp-slave"
      spec:
        profile:
        - name: "slave"
          interface: "ens5f0"
          ptp4lOpts: "-2 -s --summary_interval -4"
          phc2sysOpts: "-a -r -n 24"

Using the source file PtpConfigSlave.yaml as an example, the file defines a PtpConfig CR. The generated policy for the PtpConfigSlave example is named group-du-sno-config-policy. The PtpConfig CR defined in the generated group-du-sno-config-policy is named du-ptp-slave. The spec defined in PtpConfigSlave.yaml is placed under du-ptp-slave along with the other spec items defined under the source file.

The following example shows the group-du-sno-config-policy CR:

apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
  name: group-du-ptp-config-policy
  namespace: groups-sub
  annotations:
    policy.open-cluster-management.io/categories: CM Configuration Management
    policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
    policy.open-cluster-management.io/standards: NIST SP 800-53
spec:
    remediationAction: inform
    disabled: false
    policy-templates:
        - objectDefinition:
            apiVersion: policy.open-cluster-management.io/v1
            kind: ConfigurationPolicy
            metadata:
                name: group-du-ptp-config-policy-config
            spec:
                remediationAction: inform
                severity: low
                namespaceselector:
                    exclude:
                        - kube-*
                    include:
                        - '*'
                object-templates:
                    - complianceType: musthave
                      objectDefinition:
                        apiVersion: ptp.openshift.io/v1
                        kind: PtpConfig
                        metadata:
                            name: du-ptp-slave
                            namespace: openshift-ptp
                        spec:
                            recommend:
                                - match:
                                - nodeLabel: node-role.kubernetes.io/worker-du
                                  priority: 4
                                  profile: slave
                            profile:
                                - interface: ens5f0
                                  name: slave
                                  phc2sysOpts: -a -r -n 24
                                  ptp4lConf: |
                                    [global]
                                    #
                                    # Default Data Set
                                    #
                                    twoStepFlag 1
                                    slaveOnly 0
                                    priority1 128
                                    priority2 128
                                    domainNumber 24

10.1.2. Recommendations when customizing PolicyGenTemplate CRs
Copy link

Consider the following best practices when customizing site configuration PolicyGenTemplate custom resources (CRs):

Use as few policies as are necessary. Using fewer policies requires less resources. Each additional policy creates increased CPU load for the hub cluster and the deployed managed cluster. CRs are combined into policies based on the policyName field in the PolicyGenTemplate CR. CRs in the same PolicyGenTemplate which have the same value for policyName are managed under a single policy.
In disconnected environments, use a single catalog source for all Operators by configuring the registry as a single index containing all Operators. Each additional CatalogSource CR on the managed clusters increases CPU usage.
MachineConfig CRs should be included as extraManifests in the SiteConfig CR so that they are applied during installation. This can reduce the overall time taken until the cluster is ready to deploy applications.
PolicyGenTemplate CRs should override the channel field to explicitly identify the desired version. This ensures that changes in the source CR during upgrades does not update the generated subscription.

Note

When managing large numbers of spoke clusters on the hub cluster, minimize the number of policies to reduce resource consumption.

Grouping multiple configuration CRs into a single or limited number of policies is one way to reduce the overall number of policies on the hub cluster. When using the common, group, and site hierarchy of policies for managing site configuration, it is especially important to combine site-specific configurations into a single policy.

10.1.3. PolicyGenTemplate CRs for RAN deployments
Copy link

Use PolicyGenTemplate custom resources (CRs) to customize the configuration applied to the cluster by using the GitOps Zero Touch Provisioning (ZTP) pipeline. The PolicyGenTemplate CR allows you to generate one or more policies to manage the set of configuration CRs on your fleet of clusters. The PolicyGenTemplate CR identifies the set of managed CRs, bundles them into policies, builds the policy wrapping around those CRs, and associates the policies with clusters by using label binding rules.

The reference configuration, obtained from the GitOps ZTP container, is designed to provide a set of critical features and node tuning settings that ensure the cluster can support the stringent performance and resource utilization constraints typical of RAN (Radio Access Network) Distributed Unit (DU) applications. Changes or omissions from the baseline configuration can affect feature availability, performance, and resource utilization. Use the reference PolicyGenTemplate CRs as the basis to create a hierarchy of configuration files tailored to your specific site requirements.

The baseline PolicyGenTemplate CRs that are defined for RAN DU cluster configuration can be extracted from the GitOps ZTP ztp-site-generate container. See "Preparing the GitOps ZTP site configuration repository" for further details.

The PolicyGenTemplate CRs can be found in the ./out/argocd/example/policygentemplates folder. The reference architecture has common, group, and site-specific configuration CRs. Each PolicyGenTemplate CR refers to other CRs that can be found in the ./out/source-crs folder.

The PolicyGenTemplate CRs relevant to RAN cluster configuration are described below. Variants are provided for the group PolicyGenTemplate CRs to account for differences in single-node, three-node compact, and standard cluster configurations. Similarly, site-specific configuration variants are provided for single-node clusters and multi-node (compact or standard) clusters. Use the group and site-specific configuration variants that are relevant for your deployment.

Expand

Table 10.1. PolicyGenTemplate CRs for RAN deployments
PolicyGenTemplate CR	Description
`example-multinode-site.yaml`	Contains a set of CRs that get applied to multi-node clusters. These CRs configure SR-IOV features typical for RAN installations.
`example-sno-site.yaml`	Contains a set of CRs that get applied to single-node OpenShift clusters. These CRs configure SR-IOV features typical for RAN installations.
`common-mno-ranGen.yaml`	Contains a set of common RAN policy configuration that get applied to multi-node clusters.
`common-ranGen.yaml`	Contains a set of common RAN CRs that get applied to all clusters. These CRs subscribe to a set of operators providing cluster features typical for RAN as well as baseline cluster tuning.
`group-du-3node-ranGen.yaml`	Contains the RAN policies for three-node clusters only.
`group-du-sno-ranGen.yaml`	Contains the RAN policies for single-node clusters only.
`group-du-standard-ranGen.yaml`	Contains the RAN policies for standard three control-plane clusters.
`group-du-3node-validator-ranGen.yaml`	`PolicyGenTemplate` CR used to generate the various policies required for three-node clusters.
`group-du-standard-validator-ranGen.yaml`	`PolicyGenTemplate` CR used to generate the various policies required for standard clusters.
`group-du-sno-validator-ranGen.yaml`	`PolicyGenTemplate` CR used to generate the various policies required for single-node OpenShift clusters.

10.1.4. Customizing a managed cluster with PolicyGenTemplate CRs
Copy link

Use the following procedure to customize the policies that get applied to the managed cluster that you provision using the GitOps Zero Touch Provisioning (ZTP) pipeline.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in to the hub cluster as a user with cluster-admin privileges.
You configured the hub cluster for generating the required installation and policy CRs.
You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.

Procedure

Create a PolicyGenTemplate CR for site-specific configuration CRs.
1. Choose the appropriate example for your CR from the out/argocd/example/policygentemplates folder, for example, example-sno-site.yaml or example-multinode-site.yaml.
2. Change the spec.bindingRules field in the example file to match the site-specific label included in the SiteConfig CR. In the example SiteConfig file, the site-specific label is sites: example-sno.
  Note
  Ensure that the labels defined in your PolicyGenTemplate spec.bindingRules field correspond to the labels that are defined in the related managed clusters SiteConfig CR.
3. Change the content in the example file to match the desired configuration.
Optional: Create a PolicyGenTemplate CR for any common configuration CRs that apply to the entire fleet of clusters.
1. Select the appropriate example for your CR from the out/argocd/example/policygentemplates folder, for example, common-ranGen.yaml.
2. Change the content in the example file to match the required configuration.
Optional: Create a PolicyGenTemplate CR for any group configuration CRs that apply to the certain groups of clusters in the fleet.
Ensure that the content of the overlaid spec files matches your required end state. As a reference, the out/source-crs directory contains the full list of source-crs available to be included and overlaid by your PolicyGenTemplate templates.
Note
Depending on the specific requirements of your clusters, you might need more than a single group policy per cluster type, especially considering that the example group policies each have a single PerformancePolicy.yaml file that can only be shared across a set of clusters if those clusters consist of identical hardware configurations.
1. Select the appropriate example for your CR from the out/argocd/example/policygentemplates folder, for example, group-du-sno-ranGen.yaml.
2. Change the content in the example file to match the required configuration.
Optional. Create a validator inform policy PolicyGenTemplate CR to signal when the GitOps ZTP installation and configuration of the deployed cluster is complete. For more information, see "Creating a validator inform policy".
Define all the policy namespaces in a YAML file similar to the example out/argocd/example/policygentemplates/ns.yaml file.
Important
Do not include the Namespace CR in the same file with the PolicyGenTemplate CR.
Add the PolicyGenTemplate CRs and Namespace CR to the kustomization.yaml file in the generators section, similar to the example shown in out/argocd/example/policygentemplateskustomization.yaml.
Commit the PolicyGenTemplate CRs, Namespace CR, and associated kustomization.yaml file in your Git repository and push the changes.
The ArgoCD pipeline detects the changes and begins the managed cluster deployment. You can push the changes to the SiteConfig CR and the PolicyGenTemplate CR simultaneously.

10.1.5. Monitoring managed cluster policy deployment progress
Copy link

The ArgoCD pipeline uses PolicyGenTemplate CRs in Git to generate the RHACM policies and then sync them to the hub cluster. You can monitor the progress of the managed cluster policy synchronization after the assisted service installs OpenShift Container Platform on the managed cluster.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in to the hub cluster as a user with cluster-admin privileges.

Procedure

The Topology Aware Lifecycle Manager (TALM) applies the configuration policies that are bound to the cluster.
After the cluster installation is complete and the cluster becomes Ready, a ClusterGroupUpgrade CR corresponding to this cluster, with a list of ordered policies defined by the ran.openshift.io/ztp-deploy-wave annotations, is automatically created by the TALM. The cluster’s policies are applied in the order listed in ClusterGroupUpgrade CR.
You can monitor the high-level progress of configuration policy reconciliation by using the following commands:
```
$ export CLUSTER=<clusterName>
```
```
$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[-1:]}' | jq
```
Example output
```
{
  "lastTransitionTime": "2022-11-09T07:28:09Z",
  "message": "Remediating non-compliant policies",
  "reason": "InProgress",
  "status": "True",
  "type": "Progressing"
}
```

You can monitor the detailed cluster policy compliance status by using the RHACM dashboard or the command line.

To check policy compliance by using oc, run the following command:

$ oc get policies -n $CLUSTER

Example output

NAME                                                     REMEDIATION ACTION   COMPLIANCE STATE   AGE
ztp-common.common-config-policy                          inform               Compliant          3h42m
ztp-common.common-subscriptions-policy                   inform               NonCompliant       3h42m
ztp-group.group-du-sno-config-policy                     inform               NonCompliant       3h42m
ztp-group.group-du-sno-validator-du-policy               inform               NonCompliant       3h42m
ztp-install.example1-common-config-policy-pjz9s          enforce              Compliant          167m
ztp-install.example1-common-subscriptions-policy-zzd9k   enforce              NonCompliant       164m
ztp-site.example1-config-policy                          inform               NonCompliant       3h42m
ztp-site.example1-perf-policy                            inform               NonCompliant       3h42m

To check policy status from the RHACM web console, perform the following actions:
1. Click Governance Find policies.
2. Click on a cluster policy to check its status.

When all of the cluster policies become compliant, GitOps ZTP installation and configuration for the cluster is complete. The ztp-done label is added to the cluster.

In the reference configuration, the final policy that becomes compliant is the one defined in the *-du-validator-policy policy. This policy, when compliant on a cluster, ensures that all cluster configuration, Operator installation, and Operator configuration is complete.

10.1.6. Validating the generation of configuration policy CRs
Copy link

Policy custom resources (CRs) are generated in the same namespace as the PolicyGenTemplate from which they are created. The same troubleshooting flow applies to all policy CRs generated from a PolicyGenTemplate regardless of whether they are ztp-common, ztp-group, or ztp-site based, as shown using the following commands:

$ export NS=<namespace>

$ oc get policy -n $NS

The expected set of policy-wrapped CRs should be displayed.

If the policies failed synchronization, use the following troubleshooting steps.

Procedure

To display detailed information about the policies, run the following command:
```
$ oc describe -n openshift-gitops application policies
```

Check for Status: Conditions: to show the error logs. For example, setting an invalid sourceFile entry to fileName: generates the error shown below:

Status:
  Conditions:
    Last Transition Time:  2021-11-26T17:21:39Z
    Message:               rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/policies/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not find test.yaml under source-crs/: no such file or directory Error: failure in plugin configured via /tmp/kust-plugin-config-52463179; exit status 1: exit status 1
    Type:  ComparisonError

Check for Status: Sync:. If there are log errors at Status: Conditions:, the Status: Sync: shows Unknown or Error:

Status:
  Sync:
    Compared To:
      Destination:
        Namespace:  policies-sub
        Server:     https://kubernetes.default.svc
      Source:
        Path:             policies
        Repo URL:         https://git.com/ran-sites/policies/.git
        Target Revision:  master
    Status:               Error

When Red Hat Advanced Cluster Management (RHACM) recognizes that policies apply to a ManagedCluster object, the policy CR objects are applied to the cluster namespace. Check to see if the policies were copied to the cluster namespace:

$ oc get policy -n $CLUSTER

Example output

NAME                                         REMEDIATION ACTION   COMPLIANCE STATE   AGE
ztp-common.common-config-policy              inform               Compliant          13d
ztp-common.common-subscriptions-policy       inform               Compliant          13d
ztp-group.group-du-sno-config-policy         inform               Compliant          13d
ztp-group.group-du-sno-validator-du-policy   inform               Compliant          13d
ztp-site.example-sno-config-policy           inform               Compliant          13d

RHACM copies all applicable policies into the cluster namespace. The copied policy names have the format: <PolicyGenTemplate.Namespace>.<PolicyGenTemplate.Name>-<policyName>.

Check the placement rule for any policies not copied to the cluster namespace. The matchSelector in the PlacementRule for those policies should match labels on the ManagedCluster object:
```
$ oc get PlacementRule -n $NS
```
Note the PlacementRule name appropriate for the missing policy, common, group, or site, using the following command:
```
$ oc get PlacementRule -n $NS <placement_rule_name> -o yaml
```
- The status-decisions should include your cluster name.
- The key-value pair of the matchSelector in the spec must match the labels on your managed cluster.

Check the labels on the ManagedCluster object by using the following command:

$ oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq

Check to see what policies are compliant by using the following command:
```
$ oc get policy -n $CLUSTER
```
If the Namespace, OperatorGroup, and Subscription policies are compliant but the Operator configuration policies are not, it is likely that the Operators did not install on the managed cluster. This causes the Operator configuration policies to fail to apply because the CRD is not yet applied to the spoke.

10.1.7. Restarting policy reconciliation
Copy link

You can restart policy reconciliation when unexpected compliance issues occur, for example, when the ClusterGroupUpgrade custom resource (CR) has timed out.

Procedure

A ClusterGroupUpgrade CR is generated in the namespace ztp-install by the Topology Aware Lifecycle Manager after the managed cluster becomes Ready:
```
$ export CLUSTER=<clusterName>
```
```
$ oc get clustergroupupgrades -n ztp-install $CLUSTER
```
If there are unexpected issues and the policies fail to become complaint within the configured timeout (the default is 4 hours), the status of the ClusterGroupUpgrade CR shows UpgradeTimedOut:
```
$ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
```
A ClusterGroupUpgrade CR in the UpgradeTimedOut state automatically restarts its policy reconciliation every hour. If you have changed your policies, you can start a retry immediately by deleting the existing ClusterGroupUpgrade CR. This triggers the automatic creation of a new ClusterGroupUpgrade CR that begins reconciling the policies immediately:
```
$ oc delete clustergroupupgrades -n ztp-install $CLUSTER
```

Note that when the ClusterGroupUpgrade CR completes with status UpgradeCompleted and the managed cluster has the label ztp-done applied, you can make additional configuration changes by using PolicyGenTemplate. Deleting the existing ClusterGroupUpgrade CR will not make the TALM generate a new CR.

At this point, GitOps ZTP has completed its interaction with the cluster and any further interactions should be treated as an update and a new ClusterGroupUpgrade CR created for remediation of the policies.

10.1.8. Changing applied managed cluster CRs using policies
Copy link

You can remove content from a custom resource (CR) that is deployed in a managed cluster through a policy.

By default, all Policy CRs created from a PolicyGenTemplate CR have the complianceType field set to musthave. A musthave policy without the removed content is still compliant because the CR on the managed cluster has all the specified content. With this configuration, when you remove content from a CR, TALM removes the content from the policy but the content is not removed from the CR on the managed cluster.

With the complianceType field to mustonlyhave, the policy ensures that the CR on the cluster is an exact match of what is specified in the policy.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in to the hub cluster as a user with cluster-admin privileges.
You have deployed a managed cluster from a hub cluster running RHACM.
You have installed Topology Aware Lifecycle Manager on the hub cluster.

Procedure

Remove the content that you no longer need from the affected CRs. In this example, the disableDrain: false line was removed from the SriovOperatorConfig CR.

Example CR

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  configDaemonNodeSelector:
    "node-role.kubernetes.io/$mcp": ""
  disableDrain: true
  enableInjector: true
  enableOperatorWebhook: true

Change the complianceType of the affected policies to mustonlyhave in the group-du-sno-ranGen.yaml file.
Example YAML
```
- fileName: SriovOperatorConfig.yaml
  policyName: "config-policy"
  complianceType: mustonlyhave
```

Create a ClusterGroupUpdates CR and specify the clusters that must receive the CR changes::

Example ClusterGroupUpdates CR

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu-remove
  namespace: default
spec:
  managedPolicies:
    - ztp-group.group-du-sno-config-policy
  enable: false
  clusters:
  - spoke1
  - spoke2
  remediationStrategy:
    maxConcurrency: 2
    timeout: 240
  batchTimeoutAction:

Create the ClusterGroupUpgrade CR by running the following command:
```
$ oc create -f cgu-remove.yaml
```
When you are ready to apply the changes, for example, during an appropriate maintenance window, change the value of the spec.enable field to true by running the following command:
```
$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-remove \
--patch '{"spec":{"enable":true}}' --type=merge
```

Verification

Check the status of the policies by running the following command:

$ oc get <kind> <changed_cr_name>

Example output

NAMESPACE   NAME                                                   REMEDIATION ACTION   COMPLIANCE STATE   AGE
default     cgu-ztp-group.group-du-sno-config-policy               enforce                                 17m
default     ztp-group.group-du-sno-config-policy                   inform               NonCompliant       15h

When the COMPLIANCE STATE of the policy is Compliant, it means that the CR is updated and the unwanted content is removed.

Check that the policies are removed from the targeted clusters by running the following command on the managed clusters:
```
$ oc get <kind> <changed_cr_name>
```
If there are no results, the CR is removed from the managed cluster.

10.1.9. Indication of done for GitOps ZTP installations
Copy link

GitOps Zero Touch Provisioning (ZTP) simplifies the process of checking the GitOps ZTP installation status for a cluster. The GitOps ZTP status moves through three phases: cluster installation, cluster configuration, and GitOps ZTP done.

Cluster installation phase

The cluster installation phase is shown by the ManagedClusterJoined and ManagedClusterAvailable conditions in the ManagedCluster CR . If the ManagedCluster CR does not have these conditions, or the condition is set to False, the cluster is still in the installation phase. Additional details about installation are available from the AgentClusterInstall and ClusterDeployment CRs. For more information, see "Troubleshooting GitOps ZTP".

Cluster configuration phase

The cluster configuration phase is shown by a ztp-running label applied the ManagedCluster CR for the cluster.

GitOps ZTP done

Cluster installation and configuration is complete in the GitOps ZTP done phase. This is shown by the removal of the ztp-running label and addition of the ztp-done label to the ManagedCluster CR. The ztp-done label shows that the configuration has been applied and the baseline DU configuration has completed cluster tuning.

The change to the GitOps ZTP done state is conditional on the compliant state of a Red Hat Advanced Cluster Management (RHACM) validator inform policy. This policy captures the existing criteria for a completed installation and validates that it moves to a compliant state only when GitOps ZTP provisioning of the managed cluster is complete.

The validator inform policy ensures the configuration of the cluster is fully applied and Operators have completed their initialization. The policy validates the following:

The target MachineConfigPool contains the expected entries and has finished updating. All nodes are available and not degraded.
The SR-IOV Operator has completed initialization as indicated by at least one SriovNetworkNodeState with syncStatus: Succeeded.
The PTP Operator daemon set exists.

10.2. Advanced managed cluster configuration with PolicyGenTemplate resources
Copy link

You can use PolicyGenTemplate CRs to deploy custom functionality in your managed clusters.

Important

For more information about PolicyGenerator resources, see the RHACM Policy Generator documentation.

10.2.1. Deploying additional changes to clusters
Copy link

If you require cluster configuration changes outside of the base GitOps Zero Touch Provisioning (ZTP) pipeline configuration, there are three options:

Apply the additional configuration after the GitOps ZTP pipeline is complete: When the GitOps ZTP pipeline deployment is complete, the deployed cluster is ready for application workloads. At this point, you can install additional Operators and apply configurations specific to your requirements. Ensure that additional configurations do not negatively affect the performance of the platform or allocated CPU budget.
Add content to the GitOps ZTP library: The base source custom resources (CRs) that you deploy with the GitOps ZTP pipeline can be augmented with custom content as required.
Create extra manifests for the cluster installation: Extra manifests are applied during installation and make the installation process more efficient.

Important

Providing additional source CRs or modifying existing source CRs can significantly impact the performance or CPU profile of OpenShift Container Platform.

10.2.2. Using PolicyGenTemplate CRs to override source CRs content
Copy link

PolicyGenTemplate custom resources (CRs) allow you to overlay additional configuration details on top of the base source CRs provided with the GitOps plugin in the ztp-site-generate container. You can think of PolicyGenTemplate CRs as a logical merge or patch to the base CR. Use PolicyGenTemplate CRs to update a single field of the base CR, or overlay the entire contents of the base CR. You can update values and insert fields that are not in the base CR.

The following example procedure describes how to update fields in the generated PerformanceProfile CR for the reference configuration based on the PolicyGenTemplate CR in the group-du-sno-ranGen.yaml file. Use the procedure as a basis for modifying other parts of the PolicyGenTemplate based on your requirements.

Prerequisites

Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for Argo CD.

Procedure

Review the baseline source CR for existing content. You can review the source CRs listed in the reference PolicyGenTemplate CRs by extracting them from the GitOps Zero Touch Provisioning (ZTP) container.
1. Create an /out folder:
  $ mkdir -p ./out
2. Extract the source CRs:
  $ podman run --log-driver=none --rm registry.redhat.io/openshift4/ztp-site-generate-rhel8:v4.17.1 extract /home/ztp --tar | tar x -C ./out

Review the baseline PerformanceProfile CR in ./out/source-crs/PerformanceProfile.yaml:

apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: $name
  annotations:
    ran.openshift.io/ztp-deploy-wave: "10"
spec:
  additionalKernelArgs:
  - "idle=poll"
  - "rcupdate.rcu_normal_after_boot=0"
  cpu:
    isolated: $isolated
    reserved: $reserved
  hugepages:
    defaultHugepagesSize: $defaultHugepagesSize
    pages:
      - size: $size
        count: $count
        node: $node
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/$mcp: ""
  net:
    userLevelNetworking: true
  nodeSelector:
    node-role.kubernetes.io/$mcp: ''
  numa:
    topologyPolicy: "restricted"
  realTimeKernel:
    enabled: true

Note

Any fields in the source CR which contain $… are removed from the generated CR if they are not provided in the PolicyGenTemplate CR.

Update the PolicyGenTemplate entry for PerformanceProfile in the group-du-sno-ranGen.yaml reference file. The following example PolicyGenTemplate CR stanza supplies appropriate CPU specifications, sets the hugepages configuration, and adds a new field that sets globallyDisableIrqLoadBalancing to false.

- fileName: PerformanceProfile.yaml
  policyName: "config-policy"
  metadata:
    name: openshift-node-performance-profile
  spec:
    cpu:
      # These must be tailored for the specific hardware platform
      isolated: "2-19,22-39"
      reserved: "0-1,20-21"
    hugepages:
      defaultHugepagesSize: 1G
      pages:
        - size: 1G
          count: 10
    globallyDisableIrqLoadBalancing: false

Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP argo CD application.

Example output

The GitOps ZTP application generates an RHACM policy that contains the generated PerformanceProfile CR. The contents of that CR are derived by merging the metadata and spec contents from the PerformanceProfile entry in the PolicyGenTemplate onto the source CR. The resulting CR has the following content:

---
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
    name: openshift-node-performance-profile
spec:
    additionalKernelArgs:
        - idle=poll
        - rcupdate.rcu_normal_after_boot=0
    cpu:
        isolated: 2-19,22-39
        reserved: 0-1,20-21
    globallyDisableIrqLoadBalancing: false
    hugepages:
        defaultHugepagesSize: 1G
        pages:
            - count: 10
              size: 1G
    machineConfigPoolSelector:
        pools.operator.machineconfiguration.openshift.io/master: ""
    net:
        userLevelNetworking: true
    nodeSelector:
        node-role.kubernetes.io/master: ""
    numa:
        topologyPolicy: restricted
    realTimeKernel:
        enabled: true

Note

In the /source-crs folder that you extract from the ztp-site-generate container, the $ syntax is not used for template substitution as implied by the syntax. Rather, if the policyGen tool sees the $ prefix for a string and you do not specify a value for that field in the related PolicyGenTemplate CR, the field is omitted from the output CR entirely.

An exception to this is the $mcp variable in /source-crs YAML files that is substituted with the specified value for mcp from the PolicyGenTemplate CR. For example, in example/policygentemplates/group-du-standard-ranGen.yaml, the value for mcp is worker:

spec:
  bindingRules:
    group-du-standard: ""
  mcp: "worker"

The policyGen tool replace instances of $mcp with worker in the output CRs.

10.2.3. Adding custom content to the GitOps ZTP pipeline
Copy link

Perform the following procedure to add new content to the GitOps ZTP pipeline.

Procedure

Create a subdirectory named source-crs in the directory that contains the kustomization.yaml file for the PolicyGenTemplate custom resource (CR).

Add your user-provided CRs to the source-crs subdirectory, as shown in the following example:

example
└── policygentemplates
    ├── dev.yaml
    ├── kustomization.yaml
    ├── mec-edge-sno1.yaml
    ├── sno.yaml
    └── source-crs


        ├── PaoCatalogSource.yaml
        ├── PaoSubscription.yaml
        ├── custom-crs
        |   ├── apiserver-config.yaml
        |   └── disable-nic-lldp.yaml
        └── elasticsearch
            ├── ElasticsearchNS.yaml
            └── ElasticsearchOperatorGroup.yaml

1: The source-crs subdirectory must be in the same directory as the kustomization.yaml file.

Update the required PolicyGenTemplate CRs to include references to the content you added in the source-crs/custom-crs and source-crs/elasticsearch directories. For example:

apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
  name: "group-dev"
  namespace: "ztp-clusters"
spec:
  bindingRules:
    dev: "true"
  mcp: "master"
  sourceFiles:
    # These policies/CRs come from the internal container Image
    #Cluster Logging
    - fileName: ClusterLogNS.yaml
      remediationAction: inform
      policyName: "group-dev-cluster-log-ns"
    - fileName: ClusterLogOperGroup.yaml
      remediationAction: inform
      policyName: "group-dev-cluster-log-operator-group"
    - fileName: ClusterLogSubscription.yaml
      remediationAction: inform
      policyName: "group-dev-cluster-log-sub"
    #Local Storage Operator
    - fileName: StorageNS.yaml
      remediationAction: inform
      policyName: "group-dev-lso-ns"
    - fileName: StorageOperGroup.yaml
      remediationAction: inform
      policyName: "group-dev-lso-operator-group"
    - fileName: StorageSubscription.yaml
      remediationAction: inform
      policyName: "group-dev-lso-sub"
    #These are custom local policies that come from the source-crs directory in the git repo
    # Performance Addon Operator
    - fileName: PaoSubscriptionNS.yaml
      remediationAction: inform
      policyName: "group-dev-pao-ns"
    - fileName: PaoSubscriptionCatalogSource.yaml
      remediationAction: inform
      policyName: "group-dev-pao-cat-source"
      spec:
        image: <container_image_url>
    - fileName: PaoSubscription.yaml
      remediationAction: inform
      policyName: "group-dev-pao-sub"
    #Elasticsearch Operator
    - fileName: elasticsearch/ElasticsearchNS.yaml


      remediationAction: inform
      policyName: "group-dev-elasticsearch-ns"
    - fileName: elasticsearch/ElasticsearchOperatorGroup.yaml
      remediationAction: inform
      policyName: "group-dev-elasticsearch-operator-group"
    #Custom Resources
    - fileName: custom-crs/apiserver-config.yaml


      remediationAction: inform
      policyName: "group-dev-apiserver-config"
    - fileName: custom-crs/disable-nic-lldp.yaml
      remediationAction: inform
      policyName: "group-dev-disable-nic-lldp"

1 2: Set fileName to include the relative path to the file from the /source-crs parent directory.

Commit the PolicyGenTemplate change in Git, and then push to the Git repository that is monitored by the GitOps ZTP Argo CD policies application.

Update the ClusterGroupUpgrade CR to include the changed PolicyGenTemplate and save it as cgu-test.yaml. The following example shows a generated cgu-test.yaml file.

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: custom-source-cr
  namespace: ztp-clusters
spec:
  managedPolicies:
    - group-dev-config-policy
  enable: true
  clusters:
  - cluster1
  remediationStrategy:
    maxConcurrency: 2
    timeout: 240

Apply the updated ClusterGroupUpgrade CR by running the following command:
```
$ oc apply -f cgu-test.yaml
```

Verification

Check that the updates have succeeded by running the following command:

$ oc get cgu -A

Example output

NAMESPACE     NAME               AGE   STATE        DETAILS
ztp-clusters  custom-source-cr   6s    InProgress   Remediating non-compliant policies
ztp-install   cluster1           19h   Completed    All clusters are compliant with all the managed policies

10.2.4. Configuring policy compliance evaluation timeouts for PolicyGenTemplate CRs
Copy link

Use Red Hat Advanced Cluster Management (RHACM) installed on a hub cluster to monitor and report on whether your managed clusters are compliant with applied policies. RHACM uses policy templates to apply predefined policy controllers and policies. Policy controllers are Kubernetes custom resource definition (CRD) instances.

You can override the default policy evaluation intervals with PolicyGenTemplate custom resources (CRs). You configure duration settings that define how long a ConfigurationPolicy CR can be in a state of policy compliance or non-compliance before RHACM re-evaluates the applied cluster policies.

The GitOps Zero Touch Provisioning (ZTP) policy generator generates ConfigurationPolicy CR policies with pre-defined policy evaluation intervals. The default value for the noncompliant state is 10 seconds. The default value for the compliant state is 10 minutes. To disable the evaluation interval, set the value to never.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in to the hub cluster as a user with cluster-admin privileges.
You have created a Git repository where you manage your custom site configuration data.

Procedure

To configure the evaluation interval for all policies in a PolicyGenTemplate CR, set appropriate compliant and noncompliant values for the evaluationInterval field. For example:
```
spec:
  evaluationInterval:
    compliant: 30m
    noncompliant: 20s
```
Note
You can also set compliant and noncompliant fields to never to stop evaluating the policy after it reaches particular compliance state.

To configure the evaluation interval for an individual policy object in a PolicyGenTemplate CR, add the evaluationInterval field and set appropriate values. For example:

spec:
  sourceFiles:
    - fileName: SriovSubscription.yaml
      policyName: "sriov-sub-policy"
      evaluationInterval:
        compliant: never
        noncompliant: 10s

Commit the PolicyGenTemplate CRs files in the Git repository and push your changes.

Verification

Check that the managed spoke cluster policies are monitored at the expected intervals.

Get the pods that are running in the open-cluster-management-agent-addon namespace. Run the following command:

$ oc get pods -n open-cluster-management-agent-addon

Example output

NAME                                         READY   STATUS    RESTARTS        AGE
config-policy-controller-858b894c68-v4xdb    1/1     Running   22 (5d8h ago)   10d

Check the applied policies are being evaluated at the expected interval in the logs for the config-policy-controller pod:

$ oc logs -n open-cluster-management-agent-addon config-policy-controller-858b894c68-v4xdb

Example output

2022-05-10T15:10:25.280Z       info   configuration-policy-controller controllers/configurationpolicy_controller.go:166      Skipping the policy evaluation due to the policy not reaching the evaluation interval  {"policy": "compute-1-config-policy-config"}
2022-05-10T15:10:25.280Z       info   configuration-policy-controller controllers/configurationpolicy_controller.go:166      Skipping the policy evaluation due to the policy not reaching the evaluation interval  {"policy": "compute-1-common-compute-1-catalog-policy-config"}

10.2.5. Signalling GitOps ZTP cluster deployment completion with validator inform policies
Copy link

Create a validator inform policy that signals when the GitOps Zero Touch Provisioning (ZTP) installation and configuration of the deployed cluster is complete. This policy can be used for deployments of single-node OpenShift clusters, three-node clusters, and standard clusters.

Procedure

Create a standalone PolicyGenTemplate custom resource (CR) that contains the source file validatorCRs/informDuValidator.yaml. You only need one standalone PolicyGenTemplate CR for each cluster type. For example, this CR applies a validator inform policy for single-node OpenShift clusters:
Example single-node cluster validator inform policy CR (group-du-sno-validator-ranGen.yaml)
```
apiVersion: ran.openshift.io/v1
kind: PolicyGenTemplate
metadata:
  name: "group-du-sno-validator" 
```
1
```
  namespace: "ztp-group" 
```
2
```
spec:
  bindingRules:
    group-du-sno: "" 
```
3
```
  bindingExcludedRules:
    ztp-done: "" 
```
4
```
  mcp: "master" 
```
5
```
  sourceFiles:
    - fileName: validatorCRs/informDuValidator.yaml
      remediationAction: inform 
```
6
```
      policyName: "du-policy" 
```
7
1
The name of the {policy-gen-crs} object. This name is also used as part of the names for the placementBinding, placementRule, and policy that are created in the requested namespace.
2
This value should match the namespace used in the group policy-gen-crs.
3
The group-du-* label defined in bindingRules must exist in the SiteConfig files.
4
The label defined in bindingExcludedRules must be`ztp-done:`. The ztp-done label is used in coordination with the Topology Aware Lifecycle Manager.
5
mcp defines the MachineConfigPool object that is used in the source file validatorCRs/informDuValidator.yaml. It should be master for single node and three-node cluster deployments and worker for standard cluster deployments.
6
Optional. The default value is inform.
7
This value is used as part of the name for the generated RHACM policy. The generated validator policy for the single node example is group-du-sno-validator-du-policy.
Commit the PolicyGenTemplate CR file in your Git repository and push the changes.

10.2.6. Configuring power states using PolicyGenTemplate CRs
Copy link

For low latency and high-performance edge deployments, it is necessary to disable or limit C-states and P-states. With this configuration, the CPU runs at a constant frequency, which is typically the maximum turbo frequency. This ensures that the CPU is always running at its maximum speed, which results in high performance and low latency. This leads to the best latency for workloads. However, this also leads to the highest power consumption, which might not be necessary for all workloads.

Workloads can be classified as critical or non-critical, with critical workloads requiring disabled C-state and P-state settings for high performance and low latency, while non-critical workloads use C-state and P-state settings for power savings at the expense of some latency and performance. You can configure the following three power states using GitOps Zero Touch Provisioning (ZTP):

High-performance mode provides ultra low latency at the highest power consumption.
Performance mode provides low latency at a relatively high power consumption.
Power saving balances reduced power consumption with increased latency.

The default configuration is for a low latency, performance mode.

PolicyGenTemplate custom resources (CRs) allow you to overlay additional configuration details onto the base source CRs provided with the GitOps plugin in the ztp-site-generate container.

Configure the power states by updating the workloadHints fields in the generated PerformanceProfile CR for the reference configuration, based on the PolicyGenTemplate CR in the group-du-sno-ranGen.yaml.

The following common prerequisites apply to configuring all three power states.

Prerequisites

You have created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for Argo CD.
You have followed the procedure described in "Preparing the GitOps ZTP site configuration repository".

10.2.6.1. Configuring performance mode using PolicyGenTemplate CRs
Copy link

Follow this example to set performance mode by updating the workloadHints fields in the generated PerformanceProfile CR for the reference configuration, based on the PolicyGenTemplate CR in the group-du-sno-ranGen.yaml.

Performance mode provides low latency at a relatively high power consumption.

Prerequisites

You have configured the BIOS with performance related settings by following the guidance in "Configuring host firmware for low latency and high performance".

Procedure

Update the PolicyGenTemplate entry for PerformanceProfile in the group-du-sno-ranGen.yaml reference file in out/argocd/example/policygentemplates// as follows to set performance mode.

- fileName: PerformanceProfile.yaml
  policyName: "config-policy"
  metadata:
  # ...
  spec:
    # ...
    workloadHints:
         realTime: true
         highPowerConsumption: false
         perPodPowerManagement: false

Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.

10.2.6.2. Configuring high-performance mode using PolicyGenTemplate CRs
Copy link

Follow this example to set high performance mode by updating the workloadHints fields in the generated PerformanceProfile CR for the reference configuration, based on the PolicyGenTemplate CR in the group-du-sno-ranGen.yaml.

High performance mode provides ultra low latency at the highest power consumption.

Prerequisites

You have configured the BIOS with performance related settings by following the guidance in "Configuring host firmware for low latency and high performance".

Procedure

Update the PolicyGenTemplate entry for PerformanceProfile in the group-du-sno-ranGen.yaml reference file in out/argocd/example/policygentemplates/ as follows to set high-performance mode.

- fileName: PerformanceProfile.yaml
  policyName: "config-policy"
  metadata:
  #  ...
  spec:
  #  ...
    workloadHints:
         realTime: true
         highPowerConsumption: true
         perPodPowerManagement: false

Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.

Configuring host firmware for low latency and high performance

10.2.6.3. Configuring power saving mode using PolicyGenTemplate CRs
Copy link

Follow this example to set power saving mode by updating the workloadHints fields in the generated PerformanceProfile CR for the reference configuration, based on the PolicyGenTemplate CR in the group-du-sno-ranGen.yaml.

The power saving mode balances reduced power consumption with increased latency.

Prerequisites

You enabled C-states and OS-controlled P-states in the BIOS.

Procedure

Update the PolicyGenTemplate entry for PerformanceProfile in the group-du-sno-ranGen.yaml reference file in out/argocd/example/policygentemplates/ as follows to configure power saving mode. It is recommended to configure the CPU governor for the power saving mode through the additional kernel arguments object.
```
- fileName: PerformanceProfile.yaml
  policyName: "config-policy"
  metadata:
  # ...
  spec:
    # ...
    workloadHints:
      realTime: true
      highPowerConsumption: false
      perPodPowerManagement: true
    # ...
    additionalKernelArgs:
      - # ...
      - "cpufreq.default_governor=schedutil" 
```
1
1
The schedutil governor is recommended, however, other governors that can be used include ondemand and powersave.
Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.

Verification

Select a worker node in your deployed cluster from the list of nodes identified by using the following command:
```
$ oc get nodes
```
Log in to the node by using the following command:
```
$ oc debug node/<node-name>
```
Replace <node-name> with the name of the node you want to verify the power state on.
Set /host as the root directory within the debug shell. The debug pod mounts the host’s root file system in /host within the pod. By changing the root directory to /host, you can run binaries contained in the host’s executable paths as shown in the following example:
```
# chroot /host
```
Run the following command to verify the applied power state:
```
# cat /proc/cmdline
```

Expected output

For power saving mode the intel_pstate=passive.

10.2.6.4. Maximizing power savings
Copy link

Limiting the maximum CPU frequency is recommended to achieve maximum power savings. Enabling C-states on the non-critical workload CPUs without restricting the maximum CPU frequency negates much of the power savings by boosting the frequency of the critical CPUs.

Maximize power savings by updating the sysfs plugin fields, setting an appropriate value for max_perf_pct in the TunedPerformancePatch CR for the reference configuration. This example based on the group-du-sno-ranGen.yaml describes the procedure to follow to restrict the maximum CPU frequency.

Prerequisites

You have configured power savings mode as described in "Using PolicyGenTemplate CRs to configure power savings mode".

Procedure

Update the PolicyGenTemplate entry for TunedPerformancePatch in the group-du-sno-ranGen.yaml reference file in out/argocd/example/policygentemplates/. To maximize power savings, add max_perf_pct as shown in the following example:
```
- fileName: TunedPerformancePatch.yaml
  policyName: "config-policy"
  spec:
    profile:
      - name: performance-patch
        data: |
          # ...
          [sysfs]
          /sys/devices/system/cpu/intel_pstate/max_perf_pct=<x> 
```
1
1
The max_perf_pct controls the maximum frequency the cpufreq driver is allowed to set as a percentage of the maximum supported CPU frequency. This value applies to all CPUs. You can check the maximum supported frequency in /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq. As a starting point, you can use a percentage that caps all CPUs at the All Cores Turbo frequency. The All Cores Turbo frequency is the frequency that all cores will run at when the cores are all fully occupied.
Note
To maximize power savings, set a lower value. Setting a lower value for max_perf_pct limits the maximum CPU frequency, thereby reducing power consumption, but also potentially impacting performance. Experiment with different values and monitor the system’s performance and power consumption to find the optimal setting for your use-case.
Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP Argo CD application.

10.2.7. Configuring LVM Storage using PolicyGenTemplate CRs
Copy link

You can configure Logical Volume Manager (LVM) Storage for managed clusters that you deploy with GitOps Zero Touch Provisioning (ZTP).

Note

You use LVM Storage to persist event subscriptions when you use PTP events or bare-metal hardware events with HTTP transport.

Use the Local Storage Operator for persistent storage that uses local volumes in distributed units.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.
Create a Git repository where you manage your custom site configuration data.

Procedure

To configure LVM Storage for new managed clusters, add the following YAML to spec.sourceFiles in the common-ranGen.yaml file:
```
- fileName: StorageLVMOSubscriptionNS.yaml
  policyName: subscription-policies
- fileName: StorageLVMOSubscriptionOperGroup.yaml
  policyName: subscription-policies
- fileName: StorageLVMOSubscription.yaml
  spec:
    name: lvms-operator
    channel: stable-4.17
  policyName: subscription-policies
```
Note
The Storage LVMO subscription is deprecated. In future releases of OpenShift Container Platform, the storage LVMO subscription will not be available. Instead, you must use the Storage LVMS subscription.
In OpenShift Container Platform 4.17, you can use the Storage LVMS subscription instead of the LVMO subscription. The LVMS subscription does not require manual overrides in the common-ranGen.yaml file. Add the following YAML to spec.sourceFiles in the common-ranGen.yaml file to use the Storage LVMS subscription:
- fileName: StorageLVMSubscriptionNS.yaml policyName: subscription-policies - fileName: StorageLVMSubscriptionOperGroup.yaml policyName: subscription-policies - fileName: StorageLVMSubscription.yaml policyName: subscription-policies
Add the LVMCluster CR to spec.sourceFiles in your specific group or individual site configuration file. For example, in the group-du-sno-ranGen.yaml file, add the following:
```
- fileName: StorageLVMCluster.yaml
  policyName: "lvms-config"
  spec:
    storage:
      deviceClasses:
      - name: vg1
        thinPoolConfig:
          name: thin-pool-1
          sizePercent: 90
          overprovisionRatio: 10
```
This example configuration creates a volume group (vg1) with all the available devices, except the disk where OpenShift Container Platform is installed. A thin-pool logical volume is also created.
Merge any other required changes and files with your custom site repository.
Commit the PolicyGenTemplate changes in Git, and then push the changes to your site configuration repository to deploy LVM Storage to new sites using GitOps ZTP.

10.2.8. Configuring PTP events with PolicyGenTemplate CRs
Copy link

You can use the GitOps ZTP pipeline to configure PTP events that use HTTP transport.

10.2.8.1. Configuring PTP events that use HTTP transport
Copy link

You can configure PTP events that use HTTP transport on managed clusters that you deploy with the GitOps Zero Touch Provisioning (ZTP) pipeline.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in as a user with cluster-admin privileges.
You have created a Git repository where you manage your custom site configuration data.

Procedure

Apply the following PolicyGenTemplate changes to group-du-3node-ranGen.yaml, group-du-sno-ranGen.yaml, or group-du-standard-ranGen.yaml files according to your requirements:
1. In spec.sourceFiles, add the PtpOperatorConfig CR file that configures the transport host:
  - fileName: PtpOperatorConfigForEvent.yaml policyName: "config-policy" spec: daemonNodeSelector: {} ptpEventConfig: enableEventPublisher: true transportHost: http://ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043
  Note
  In OpenShift Container Platform 4.13 or later, you do not need to set the transportHost field in the PtpOperatorConfig resource when you use HTTP transport with PTP events.
2. Configure the linuxptp and phc2sys for the PTP clock type and interface. For example, add the following YAML into spec.sourceFiles:
  - fileName: PtpConfigSlave.yaml
  1
  policyName: "config-policy" metadata: name: "du-ptp-slave" spec: profile: - name: "slave" interface: "ens5f1"
  2
  ptp4lOpts: "-2 -s --summary_interval -4"
  3
  phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16"
  4
  ptpClockThreshold:
  5
  holdOverTimeout: 30 # seconds maxOffsetThreshold: 100 # nano seconds minOffsetThreshold: -100
  1
  Can be PtpConfigMaster.yaml or PtpConfigSlave.yaml depending on your requirements. For configurations based on group-du-sno-ranGen.yaml or group-du-3node-ranGen.yaml, use PtpConfigSlave.yaml.
  2
  Device specific interface name.
  3
  You must append the --summary_interval -4 value to ptp4lOpts in .spec.sourceFiles.spec.profile to enable PTP fast events.
  4
  Required phc2sysOpts values. -m prints messages to stdout. The linuxptp-daemon DaemonSet parses the logs and generates Prometheus metrics.
  5
  Optional. If the ptpClockThreshold stanza is not present, default values are used for the ptpClockThreshold fields. The stanza shows default ptpClockThreshold values. The ptpClockThreshold values configure how long after the PTP master clock is disconnected before PTP events are triggered. holdOverTimeout is the time value in seconds before the PTP clock event state changes to FREERUN when the PTP master clock is disconnected. The maxOffsetThreshold and minOffsetThreshold settings configure offset values in nanoseconds that compare against the values for CLOCK_REALTIME (phc2sys) or master offset (ptp4l). When the ptp4l or phc2sys offset value is outside this range, the PTP clock state is set to FREERUN. When the offset value is within this range, the PTP clock state is set to LOCKED.
Merge any other required changes and files with your custom site repository.
Push the changes to your site configuration repository to deploy PTP fast events to new sites using GitOps ZTP.

10.2.9. Configuring the Image Registry Operator for local caching of images
Copy link

OpenShift Container Platform manages image caching using a local registry. In edge computing use cases, clusters are often subject to bandwidth restrictions when communicating with centralized image registries, which might result in long image download times.

Long download times are unavoidable during initial deployment. Over time, there is a risk that CRI-O will erase the /var/lib/containers/storage directory in the case of an unexpected shutdown. To address long image download times, you can create a local image registry on remote managed clusters using GitOps Zero Touch Provisioning (ZTP). This is useful in Edge computing scenarios where clusters are deployed at the far edge of the network.

Before you can set up the local image registry with GitOps ZTP, you need to configure disk partitioning in the SiteConfig CR that you use to install the remote managed cluster. After installation, you configure the local image registry using a PolicyGenTemplate CR. Then, the GitOps ZTP pipeline creates Persistent Volume (PV) and Persistent Volume Claim (PVC) CRs and patches the imageregistry configuration.

Note

The local image registry can only be used for user application images and cannot be used for the OpenShift Container Platform or Operator Lifecycle Manager operator images.

10.2.9.1. Configuring disk partitioning with SiteConfig
Copy link

Configure disk partitioning for a managed cluster using a SiteConfig CR and GitOps Zero Touch Provisioning (ZTP). The disk partition details in the SiteConfig CR must match the underlying disk.

Important

You must complete this procedure at installation time.

Prerequisites

Install Butane.

Procedure

Create the storage.bu file.

variant: fcos
version: 1.3.0
storage:
  disks:
  - device: /dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0


    wipe_table: false
    partitions:
    - label: var-lib-containers
      start_mib: <start_of_partition>


      size_mib: <partition_size>


  filesystems:
    - path: /var/lib/containers
      device: /dev/disk/by-partlabel/var-lib-containers
      format: xfs
      wipe_filesystem: true
      with_mount_unit: true
      mount_options:
        - defaults
        - prjquota

1: Specify the root disk.
2: Specify the start of the partition in MiB. If the value is too small, the installation fails.
3: Specify the size of the partition. If the value is too small, the deployments fails.

Convert the storage.bu to an Ignition file by running the following command:

$ butane storage.bu

Example output

{"ignition":{"version":"3.2.0"},"storage":{"disks":[{"device":"/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0","partitions":[{"label":"var-lib-containers","sizeMiB":0,"startMiB":250000}],"wipeTable":false}],"filesystems":[{"device":"/dev/disk/by-partlabel/var-lib-containers","format":"xfs","mountOptions":["defaults","prjquota"],"path":"/var/lib/containers","wipeFilesystem":true}]},"systemd":{"units":[{"contents":"# # Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target","enabled":true,"name":"var-lib-containers.mount"}]}}

Use a tool such as JSON Pretty Print to convert the output into JSON format.

Copy the output into the .spec.clusters.nodes.ignitionConfigOverride field in the SiteConfig CR.

Example

[...]
spec:
  clusters:
    - nodes:
        - ignitionConfigOverride: |
          {
            "ignition": {
              "version": "3.2.0"
            },
            "storage": {
              "disks": [
                {
                  "device": "/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0",
                  "partitions": [
                    {
                      "label": "var-lib-containers",
                      "sizeMiB": 0,
                      "startMiB": 250000
                    }
                  ],
                  "wipeTable": false
                }
              ],
              "filesystems": [
                {
                  "device": "/dev/disk/by-partlabel/var-lib-containers",
                  "format": "xfs",
                  "mountOptions": [
                    "defaults",
                    "prjquota"
                  ],
                  "path": "/var/lib/containers",
                  "wipeFilesystem": true
                }
              ]
            },
            "systemd": {
              "units": [
                {
                  "contents": "# # Generated by Butane\n[Unit]\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-var\\x2dlib\\x2dcontainers.service\n\n[Mount]\nWhere=/var/lib/containers\nWhat=/dev/disk/by-partlabel/var-lib-containers\nType=xfs\nOptions=defaults,prjquota\n\n[Install]\nRequiredBy=local-fs.target",
                  "enabled": true,
                  "name": "var-lib-containers.mount"
                }
              ]
            }
          }
[...]

Note

If the .spec.clusters.nodes.ignitionConfigOverride field does not exist, create it.

Verification

During or after installation, verify on the hub cluster that the BareMetalHost object shows the annotation by running the following command:

$ oc get bmh -n my-sno-ns my-sno -ojson | jq '.metadata.annotations["bmac.agent-install.openshift.io/ignition-config-overrides"]

Example output

"{\"ignition\":{\"version\":\"3.2.0\"},\"storage\":{\"disks\":[{\"device\":\"/dev/disk/by-id/wwn-0x6b07b250ebb9d0002a33509f24af1f62\",\"partitions\":[{\"label\":\"var-lib-containers\",\"sizeMiB\":0,\"startMiB\":250000}],\"wipeTable\":false}],\"filesystems\":[{\"device\":\"/dev/disk/by-partlabel/var-lib-containers\",\"format\":\"xfs\",\"mountOptions\":[\"defaults\",\"prjquota\"],\"path\":\"/var/lib/containers\",\"wipeFilesystem\":true}]},\"systemd\":{\"units\":[{\"contents\":\"# Generated by Butane\\n[Unit]\\nRequires=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\nAfter=systemd-fsck@dev-disk-by\\\\x2dpartlabel-var\\\\x2dlib\\\\x2dcontainers.service\\n\\n[Mount]\\nWhere=/var/lib/containers\\nWhat=/dev/disk/by-partlabel/var-lib-containers\\nType=xfs\\nOptions=defaults,prjquota\\n\\n[Install]\\nRequiredBy=local-fs.target\",\"enabled\":true,\"name\":\"var-lib-containers.mount\"}]}}"

After installation, check the single-node OpenShift disk status.

Enter into a debug session on the single-node OpenShift node by running the following command. This step instantiates a debug pod called <node_name>-debug:
```
$ oc debug node/my-sno-node
```
Set /host as the root directory within the debug shell by running the following command. The debug pod mounts the host’s root file system in /host within the pod. By changing the root directory to /host, you can run binaries contained in the host’s executable paths:
```
# chroot /host
```

List information about all available block devices by running the following command:

# lsblk

Example output

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    0 446.6G  0 disk
├─sda1   8:1    0     1M  0 part
├─sda2   8:2    0   127M  0 part
├─sda3   8:3    0   384M  0 part /boot
├─sda4   8:4    0 243.6G  0 part /var
│                                /sysroot/ostree/deploy/rhcos/var
│                                /usr
│                                /etc
│                                /
│                                /sysroot
└─sda5   8:5    0 202.5G  0 part /var/lib/containers

Display information about the file system disk space usage by running the following command:

# df -h

Example output

Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs           126G   84K  126G   1% /dev/shm
tmpfs            51G   93M   51G   1% /run
/dev/sda4       244G  5.2G  239G   3% /sysroot
tmpfs           126G  4.0K  126G   1% /tmp
/dev/sda5       203G  119G   85G  59% /var/lib/containers
/dev/sda3       350M  110M  218M  34% /boot
tmpfs            26G     0   26G   0% /run/user/1000

10.2.9.2. Configuring the image registry using PolicyGenTemplate CRs
Copy link

Use PolicyGenTemplate (PGT) CRs to apply the CRs required to configure the image registry and patch the imageregistry configuration.

Prerequisites

You have configured a disk partition in the managed cluster.
You have installed the OpenShift CLI (oc).
You have logged in to the hub cluster as a user with cluster-admin privileges.
You have created a Git repository where you manage your custom site configuration data for use with GitOps Zero Touch Provisioning (ZTP).

Procedure

Configure the storage class, persistent volume claim, persistent volume, and image registry configuration in the appropriate PolicyGenTemplate CR. For example, to configure an individual site, add the following YAML to the file example-sno-site.yaml:

sourceFiles:
  # storage class
  - fileName: StorageClass.yaml
    policyName: "sc-for-image-registry"
    metadata:
      name: image-registry-sc
      annotations:
        ran.openshift.io/ztp-deploy-wave: "100"


  # persistent volume claim
  - fileName: StoragePVC.yaml
    policyName: "pvc-for-image-registry"
    metadata:
      name: image-registry-pvc
      namespace: openshift-image-registry
      annotations:
        ran.openshift.io/ztp-deploy-wave: "100"
    spec:
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 100Gi
      storageClassName: image-registry-sc
      volumeMode: Filesystem
  # persistent volume
  - fileName: ImageRegistryPV.yaml


    policyName: "pv-for-image-registry"
    metadata:
      annotations:
        ran.openshift.io/ztp-deploy-wave: "100"
  - fileName: ImageRegistryConfig.yaml
    policyName: "config-for-image-registry"
    complianceType: musthave
    metadata:
      annotations:
        ran.openshift.io/ztp-deploy-wave: "100"
    spec:
      storage:
        pvc:
          claim: "image-registry-pvc"

1: Set the appropriate value for ztp-deploy-wave depending on whether you are configuring image registries at the site, common, or group level. ztp-deploy-wave: "100" is suitable for development or testing because it allows you to group the referenced source files together.
2: In ImageRegistryPV.yaml, ensure that the spec.local.path field is set to /var/imageregistry to match the value set for the mount_point field in the SiteConfig CR.

Important

Do not set complianceType: mustonlyhave for the - fileName: ImageRegistryConfig.yaml configuration. This can cause the registry pod deployment to fail.

Commit the PolicyGenTemplate change in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.

Verification

Use the following steps to troubleshoot errors with the local image registry on the managed clusters:

Verify successful login to the registry while logged in to the managed cluster. Run the following commands:

Export the managed cluster name:
```
$ cluster=<managed_cluster_name>
```

Get the managed cluster kubeconfig details:

$ oc get secret -n $cluster $cluster-admin-password -o jsonpath='{.data.password}' | base64 -d > kubeadmin-password-$cluster

Download and export the cluster kubeconfig:

$ oc get secret -n $cluster $cluster-admin-kubeconfig -o jsonpath='{.data.kubeconfig}' | base64 -d > kubeconfig-$cluster && export KUBECONFIG=./kubeconfig-$cluster

Verify access to the image registry from the managed cluster. See "Accessing the registry".

Check that the Config CRD in the imageregistry.operator.openshift.io group instance is not reporting errors. Run the following command while logged in to the managed cluster:

$ oc get image.config.openshift.io cluster -o yaml

Example output

apiVersion: config.openshift.io/v1
kind: Image
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    release.openshift.io/create-only: "true"
  creationTimestamp: "2021-10-08T19:02:39Z"
  generation: 5
  name: cluster
  resourceVersion: "688678648"
  uid: 0406521b-39c0-4cda-ba75-873697da75a4
spec:
  additionalTrustedCA:
    name: acm-ice

Check that the PersistentVolumeClaim on the managed cluster is populated with data. Run the following command while logged in to the managed cluster:
```
$ oc get pv image-registry-sc
```

Check that the registry* pod is running and is located under the openshift-image-registry namespace.

$ oc get pods -n openshift-image-registry | grep registry*

Example output

cluster-image-registry-operator-68f5c9c589-42cfg   1/1     Running     0          8d
image-registry-5f8987879-6nx6h                     1/1     Running     0          8d

Check that the disk partition on the managed cluster is correct:

Open a debug shell to the managed cluster:
```
$ oc debug node/sno-1.example.com
```

Run lsblk to check the host disk partitions:

sh-4.4# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 446.6G  0 disk
  |-sda1   8:1    0     1M  0 part
  |-sda2   8:2    0   127M  0 part
  |-sda3   8:3    0   384M  0 part /boot
  |-sda4   8:4    0 336.3G  0 part /sysroot
  `-sda5   8:5    0 100.1G  0 part /var/imageregistry


sdb      8:16   0 446.6G  0 disk
sr0     11:0    1   104M  0 rom

1: /var/imageregistry indicates that the disk is correctly partitioned.

10.3. Updating managed clusters in a disconnected environment with PolicyGenTemplate resources and TALM
Copy link

You can use the Topology Aware Lifecycle Manager (TALM) to manage the software lifecycle of managed clusters that you have deployed by using GitOps Zero Touch Provisioning (ZTP) and Topology Aware Lifecycle Manager (TALM). TALM uses Red Hat Advanced Cluster Management (RHACM) PolicyGenTemplate policies to manage and control changes applied to target clusters.

Important

For more information about PolicyGenerator resources, see the RHACM Policy Generator documentation.

10.3.1. Setting up the disconnected environment
Copy link

TALM can perform both platform and Operator updates.

You must mirror both the platform image and Operator images that you want to update to in your mirror registry before you can use TALM to update your disconnected clusters. Complete the following steps to mirror the images:

For platform updates, you must perform the following steps:
1. Mirror the desired OpenShift Container Platform image repository. Ensure that the desired platform image is mirrored by following the "Mirroring the OpenShift Container Platform image repository" procedure linked in the Additional resources. Save the contents of the imageContentSources section in the imageContentSources.yaml file:
  Example output
  imageContentSources: - mirrors: - mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4 source: quay.io/openshift-release-dev/ocp-release - mirrors: - mirror-ocp-registry.ibmcloud.io.cpak:5000/openshift-release-dev/openshift4 source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
2. Save the image signature of the desired platform image that was mirrored. You must add the image signature to the PolicyGenTemplate CR for platform updates. To get the image signature, perform the following steps:
  1. Specify the desired OpenShift Container Platform tag by running the following command:
    
    $ OCP_RELEASE_NUMBER=<release_version>
  2. Specify the architecture of the cluster by running the following command:
    
    $ ARCHITECTURE=<cluster_architecture>
    1
    
    1
    Specify the architecture of the cluster, such as x86_64, aarch64, s390x, or ppc64le.
  3. Get the release image digest from Quay by running the following command
    
    $ DIGEST="$(oc adm release info quay.io/openshift-release-dev/ocp-release:${OCP_RELEASE_NUMBER}-${ARCHITECTURE} | sed -n 's/Pull From: .*@//p')"
  4. Set the digest algorithm by running the following command:
    
    $ DIGEST_ALGO="${DIGEST%%:*}"
  5. Set the digest signature by running the following command:
    
    $ DIGEST_ENCODED="${DIGEST#*:}"
  6. Get the image signature from the mirror.openshift.com website by running the following command:
    
    $ SIGNATURE_BASE64=$(curl -s "https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/${DIGEST_ALGO}=${DIGEST_ENCODED}/signature-1" | base64 -w0 && echo)
  7. Save the image signature to the checksum-<OCP_RELEASE_NUMBER>.yaml file by running the following commands:
    
    $ cat >checksum-${OCP_RELEASE_NUMBER}.yaml <<EOF
    
    ${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64} EOF
3. Prepare the update graph. You have two options to prepare the update graph:
  1. Use the OpenShift Update Service.
    For more information about how to set up the graph on the hub cluster, see Deploy the operator for OpenShift Update Service and Build the graph data init container.
  2. Make a local copy of the upstream graph. Host the update graph on an http or https server in the disconnected environment that has access to the managed cluster. To download the update graph, use the following command:
    
    $ curl -s https://api.openshift.com/api/upgrades_info/v1/graph?channel=stable-4.17 -o ~/upgrade-graph_stable-4.17
For Operator updates, you must perform the following task:
- Mirror the Operator catalogs. Ensure that the desired operator images are mirrored by following the procedure in the "Mirroring Operator catalogs for use with disconnected clusters" section.

10.3.2. Performing a platform update with PolicyGenTemplate CRs
Copy link

You can perform a platform update with the TALM.

Prerequisites

Install the Topology Aware Lifecycle Manager (TALM).
Update GitOps Zero Touch Provisioning (ZTP) to the latest version.
Provision one or more managed clusters with GitOps ZTP.
Mirror the desired image repository.
Log in as a user with cluster-admin privileges.
Create RHACM policies in the hub cluster.

Procedure

Create a PolicyGenTemplate CR for the platform update:
1. Save the following PolicyGenTemplate CR in the du-upgrade.yaml file:
  Example of PolicyGenTemplate for platform update
  apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "du-upgrade" namespace: "ztp-group-du-sno" spec: bindingRules: group-du-sno: "" mcp: "master" remediationAction: inform sourceFiles: - fileName: ImageSignature.yaml
  1
  policyName: "platform-upgrade-prep" binaryData: ${DIGEST_ALGO}-${DIGEST_ENCODED}: ${SIGNATURE_BASE64}
  2
  - fileName: DisconnectedICSP.yaml policyName: "platform-upgrade-prep" metadata: name: disconnected-internal-icsp-for-ocp spec: repositoryDigestMirrors:
  3
  - mirrors: - quay-intern.example.com/ocp4/openshift-release-dev source: quay.io/openshift-release-dev/ocp-release - mirrors: - quay-intern.example.com/ocp4/openshift-release-dev source: quay.io/openshift-release-dev/ocp-v4.0-art-dev - fileName: ClusterVersion.yaml
  4
  policyName: "platform-upgrade" metadata: name: version spec: channel: "stable-4.17" upstream: http://upgrade.example.com/images/upgrade-graph_stable-4.17 desiredUpdate: version: 4.17.4 status: history: - version: 4.17.4 state: "Completed"
  1
  The ConfigMap CR contains the signature of the desired release image to update to.
  2
  Shows the image signature of the desired OpenShift Container Platform release. Get the signature from the checksum-${OCP_RELEASE_NUMBER}.yaml file you saved when following the procedures in the "Setting up the environment" section.
  3
  Shows the mirror repository that contains the desired OpenShift Container Platform image. Get the mirrors from the imageContentSources.yaml file that you saved when following the procedures in the "Setting up the environment" section.
  4
  Shows the ClusterVersion CR to trigger the update. The channel, upstream, and desiredVersion fields are all required for image pre-caching.
  The PolicyGenTemplate CR generates two policies:
  - The du-upgrade-platform-upgrade-prep policy does the preparation work for the platform update. It creates the ConfigMap CR for the desired release image signature, creates the image content source of the mirrored release image repository, and updates the cluster version with the desired update channel and the update graph reachable by the managed cluster in the disconnected environment.
  - The du-upgrade-platform-upgrade policy is used to perform platform upgrade.
2. Add the du-upgrade.yaml file contents to the kustomization.yaml file located in the GitOps ZTP Git repository for the PolicyGenTemplate CRs and push the changes to the Git repository.
  ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.
3. Check the created policies by running the following command:
  $ oc get policies -A | grep platform-upgrade
Create the ClusterGroupUpdate CR for the platform update with the spec.enable field set to false.
1. Save the content of the platform update ClusterGroupUpdate CR with the du-upgrade-platform-upgrade-prep and the du-upgrade-platform-upgrade policies and the target clusters to the cgu-platform-upgrade.yml file, as shown in the following example:
  apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-platform-upgrade namespace: default spec: managedPolicies: - du-upgrade-platform-upgrade-prep - du-upgrade-platform-upgrade preCaching: false clusters: - spoke1 remediationStrategy: maxConcurrency: 1 enable: false
2. Apply the ClusterGroupUpdate CR to the hub cluster by running the following command:
  $ oc apply -f cgu-platform-upgrade.yml
Optional: Pre-cache the images for the platform update.
1. Enable pre-caching in the ClusterGroupUpdate CR by running the following command:
  $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \ --patch '{"spec":{"preCaching": true}}' --type=merge
2. Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the hub cluster:
  $ oc get cgu cgu-platform-upgrade -o jsonpath='{.status.precaching.status}'
Start the platform update:
1. Enable the cgu-platform-upgrade policy and disable pre-caching by running the following command:
  $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-platform-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
2. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
  $ oc get policies --all-namespaces

10.3.3. Performing an Operator update with PolicyGenTemplate CRs
Copy link

You can perform an Operator update with the TALM.

Prerequisites

Install the Topology Aware Lifecycle Manager (TALM).
Update GitOps Zero Touch Provisioning (ZTP) to the latest version.
Provision one or more managed clusters with GitOps ZTP.
Mirror the desired index image, bundle images, and all Operator images referenced in the bundle images.
Log in as a user with cluster-admin privileges.
Create RHACM policies in the hub cluster.

Procedure

Update the PolicyGenTemplate CR for the Operator update.
1. Update the du-upgrade PolicyGenTemplate CR with the following additional contents in the du-upgrade.yaml file:
  apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "du-upgrade" namespace: "ztp-group-du-sno" spec: bindingRules: group-du-sno: "" mcp: "master" remediationAction: inform sourceFiles: - fileName: DefaultCatsrc.yaml remediationAction: inform policyName: "operator-catsrc-policy" metadata: name: redhat-operators-disconnected spec: displayName: Red Hat Operators Catalog image: registry.example.com:5000/olm/redhat-operators-disconnected:v4.17
  1
  updateStrategy:
  2
  registryPoll: interval: 1h status: connectionState: lastObservedState: READY
  3
  1
  The index image URL contains the desired Operator images. If the index images are always pushed to the same image name and tag, this change is not needed.
  2
  Set how frequently the Operator Lifecycle Manager (OLM) polls the index image for new Operator versions with the registryPoll.interval field. This change is not needed if a new index image tag is always pushed for y-stream and z-stream Operator updates. The registryPoll.interval field can be set to a shorter interval to expedite the update, however shorter intervals increase computational load. To counteract this behavior, you can restore registryPoll.interval to the default value once the update is complete.
  3
  Last observed state of the catalog connection. The READY value ensures that the CatalogSource policy is ready, indicating that the index pod is pulled and is running. This way, TALM upgrades the Operators based on up-to-date policy compliance states.
2. This update generates one policy, du-upgrade-operator-catsrc-policy, to update the redhat-operators-disconnected catalog source with the new index images that contain the desired Operators images.
  Note
  If you want to use the image pre-caching for Operators and there are Operators from a different catalog source other than redhat-operators-disconnected, you must perform the following tasks:
  Prepare a separate catalog source policy with the new index image or registry poll interval update for the different catalog source.
  Prepare a separate subscription policy for the desired Operators that are from the different catalog source.
  For example, the desired SRIOV-FEC Operator is available in the certified-operators catalog source. To update the catalog source and the Operator subscription, add the following contents to generate two policies, du-upgrade-fec-catsrc-policy and du-upgrade-subscriptions-fec-policy:
  apiVersion: ran.openshift.io/v1 kind: PolicyGenTemplate metadata: name: "du-upgrade" namespace: "ztp-group-du-sno" spec: bindingRules: group-du-sno: "" mcp: "master" remediationAction: inform sourceFiles: # ... - fileName: DefaultCatsrc.yaml remediationAction: inform policyName: "fec-catsrc-policy" metadata: name: certified-operators spec: displayName: Intel SRIOV-FEC Operator image: registry.example.com:5000/olm/far-edge-sriov-fec:v4.10 updateStrategy: registryPoll: interval: 10m - fileName: AcceleratorsSubscription.yaml policyName: "subscriptions-fec-policy" spec: channel: "stable" source: certified-operators
3. Remove the specified subscriptions channels in the common PolicyGenTemplate CR, if they exist. The default subscriptions channels from the GitOps ZTP image are used for the update.
  Note
  The default channel for the Operators applied through GitOps ZTP 4.17 is stable, except for the performance-addon-operator. As of OpenShift Container Platform 4.11, the performance-addon-operator functionality was moved to the node-tuning-operator. For the 4.10 release, the default channel for PAO is v4.10. You can also specify the default channels in the common PolicyGenTemplate CR.
4. Push the PolicyGenTemplate CRs updates to the GitOps ZTP Git repository.
  ArgoCD pulls the changes from the Git repository and generates the policies on the hub cluster.
5. Check the created policies by running the following command:
  $ oc get policies -A | grep -E "catsrc-policy|subscription"
Apply the required catalog source updates before starting the Operator update.
1. Save the content of the ClusterGroupUpgrade CR named operator-upgrade-prep with the catalog source policies and the target managed clusters to the cgu-operator-upgrade-prep.yml file:
  apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-operator-upgrade-prep namespace: default spec: clusters: - spoke1 enable: true managedPolicies: - du-upgrade-operator-catsrc-policy remediationStrategy: maxConcurrency: 1
2. Apply the policy to the hub cluster by running the following command:
  $ oc apply -f cgu-operator-upgrade-prep.yml
3. Monitor the update process. Upon completion, ensure that the policy is compliant by running the following command:
  $ oc get policies -A | grep -E "catsrc-policy"
Create the ClusterGroupUpgrade CR for the Operator update with the spec.enable field set to false.
1. Save the content of the Operator update ClusterGroupUpgrade CR with the du-upgrade-operator-catsrc-policy policy and the subscription policies created from the common PolicyGenTemplate and the target clusters to the cgu-operator-upgrade.yml file, as shown in the following example:
  apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-operator-upgrade namespace: default spec: managedPolicies: - du-upgrade-operator-catsrc-policy
  1
  - common-subscriptions-policy
  2
  preCaching: false clusters: - spoke1 remediationStrategy: maxConcurrency: 1 enable: false
  1
  The policy is needed by the image pre-caching feature to retrieve the operator images from the catalog source.
  2
  The policy contains Operator subscriptions. If you have followed the structure and content of the reference PolicyGenTemplates, all Operator subscriptions are grouped into the common-subscriptions-policy policy.
  Note
  One ClusterGroupUpgrade CR can only pre-cache the images of the desired Operators defined in the subscription policy from one catalog source included in the ClusterGroupUpgrade CR. If the desired Operators are from different catalog sources, such as in the example of the SRIOV-FEC Operator, another ClusterGroupUpgrade CR must be created with du-upgrade-fec-catsrc-policy and du-upgrade-subscriptions-fec-policy policies for the SRIOV-FEC Operator images pre-caching and update.
2. Apply the ClusterGroupUpgrade CR to the hub cluster by running the following command:
  $ oc apply -f cgu-operator-upgrade.yml

Optional: Pre-cache the images for the Operator update.

Before starting image pre-caching, verify the subscription policy is NonCompliant at this point by running the following command:

$ oc get policy common-subscriptions-policy -n <policy_namespace>

Example output

NAME                          REMEDIATION ACTION   COMPLIANCE STATE     AGE
common-subscriptions-policy   inform               NonCompliant         27d

Enable pre-caching in the ClusterGroupUpgrade CR by running the following command:

$ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \
--patch '{"spec":{"preCaching": true}}' --type=merge

Monitor the process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the managed cluster:
```
$ oc get cgu cgu-operator-upgrade -o jsonpath='{.status.precaching.status}'
```

Check if the pre-caching is completed before starting the update by running the following command:

$ oc get cgu -n default cgu-operator-upgrade -ojsonpath='{.status.conditions}' | jq

Example output

[
    {
      "lastTransitionTime": "2022-03-08T20:49:08.000Z",
      "message": "The ClusterGroupUpgrade CR is not enabled",
      "reason": "UpgradeNotStarted",
      "status": "False",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2022-03-08T20:55:30.000Z",
      "message": "Precaching is completed",
      "reason": "PrecachingCompleted",
      "status": "True",
      "type": "PrecachingDone"
    }
]

Start the Operator update.
1. Enable the cgu-operator-upgrade ClusterGroupUpgrade CR and disable pre-caching to start the Operator update by running the following command:
  $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-operator-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
2. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
  $ oc get policies --all-namespaces

10.3.4. Troubleshooting missed Operator updates with PolicyGenTemplate CRs
Copy link

In some scenarios, Topology Aware Lifecycle Manager (TALM) might miss Operator updates due to an out-of-date policy compliance state.

After a catalog source update, it takes time for the Operator Lifecycle Manager (OLM) to update the subscription status. The status of the subscription policy might continue to show as compliant while TALM decides whether remediation is needed. As a result, the Operator specified in the subscription policy does not get upgraded.

To avoid this scenario, add another catalog source configuration to the PolicyGenTemplate and specify this configuration in the subscription for any Operators that require an update.

Procedure

Add a catalog source configuration in the PolicyGenTemplate resource:

- fileName: DefaultCatsrc.yaml
      remediationAction: inform
      policyName: "operator-catsrc-policy"
      metadata:
        name: redhat-operators-disconnected
      spec:
        displayName: Red Hat Operators Catalog
        image: registry.example.com:5000/olm/redhat-operators-disconnected:v{product-version}
        updateStrategy:
          registryPoll:
            interval: 1h
      status:
        connectionState:
            lastObservedState: READY
- fileName: DefaultCatsrc.yaml
      remediationAction: inform
      policyName: "operator-catsrc-policy"
      metadata:
        name: redhat-operators-disconnected-v2


      spec:
        displayName: Red Hat Operators Catalog v2


        image: registry.example.com:5000/olm/redhat-operators-disconnected:<version>


        updateStrategy:
          registryPoll:
            interval: 1h
      status:
        connectionState:
            lastObservedState: READY

1: Update the name for the new configuration.
2: Update the display name for the new configuration.
3: Update the index image URL. This fileName.spec.image field overrides any configuration in the DefaultCatsrc.yaml file.

Update the Subscription resource to point to the new configuration for Operators that require an update:
```
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: operator-subscription
  namespace: operator-namspace
# ...
spec:
  source: redhat-operators-disconnected-v2 
```
1
```
# ...
```
1
Enter the name of the additional catalog source configuration that you defined in the PolicyGenTemplate resource.

10.3.5. Performing a platform and an Operator update together
Copy link

You can perform a platform and an Operator update at the same time.

Prerequisites

Install the Topology Aware Lifecycle Manager (TALM).
Update GitOps Zero Touch Provisioning (ZTP) to the latest version.
Provision one or more managed clusters with GitOps ZTP.
Log in as a user with cluster-admin privileges.
Create RHACM policies in the hub cluster.

Procedure

Create the PolicyGenTemplate CR for the updates by following the steps described in the "Performing a platform update" and "Performing an Operator update" sections.
Apply the prep work for the platform and the Operator update.
1. Save the content of the ClusterGroupUpgrade CR with the policies for platform update preparation work, catalog source updates, and target clusters to the cgu-platform-operator-upgrade-prep.yml file, for example:
  apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-platform-operator-upgrade-prep namespace: default spec: managedPolicies: - du-upgrade-platform-upgrade-prep - du-upgrade-operator-catsrc-policy clusterSelector: - group-du-sno remediationStrategy: maxConcurrency: 10 enable: true
2. Apply the cgu-platform-operator-upgrade-prep.yml file to the hub cluster by running the following command:
  $ oc apply -f cgu-platform-operator-upgrade-prep.yml
3. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
  $ oc get policies --all-namespaces
Create the ClusterGroupUpdate CR for the platform and the Operator update with the spec.enable field set to false.
1. Save the contents of the platform and Operator update ClusterGroupUpdate CR with the policies and the target clusters to the cgu-platform-operator-upgrade.yml file, as shown in the following example:
  apiVersion: ran.openshift.io/v1alpha1 kind: ClusterGroupUpgrade metadata: name: cgu-du-upgrade namespace: default spec: managedPolicies: - du-upgrade-platform-upgrade
  1
  - du-upgrade-operator-catsrc-policy
  2
  - common-subscriptions-policy
  3
  preCaching: true clusterSelector: - group-du-sno remediationStrategy: maxConcurrency: 1 enable: false
  1
  This is the platform update policy.
  2
  This is the policy containing the catalog source information for the Operators to be updated. It is needed for the pre-caching feature to determine which Operator images to download to the managed cluster.
  3
  This is the policy to update the Operators.
2. Apply the cgu-platform-operator-upgrade.yml file to the hub cluster by running the following command:
  $ oc apply -f cgu-platform-operator-upgrade.yml
Optional: Pre-cache the images for the platform and the Operator update.
1. Enable pre-caching in the ClusterGroupUpgrade CR by running the following command:
  $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \ --patch '{"spec":{"preCaching": true}}' --type=merge
2. Monitor the update process and wait for the pre-caching to complete. Check the status of pre-caching by running the following command on the managed cluster:
  $ oc get jobs,pods -n openshift-talm-pre-cache
3. Check if the pre-caching is completed before starting the update by running the following command:
  $ oc get cgu cgu-du-upgrade -ojsonpath='{.status.conditions}'
Start the platform and Operator update.
1. Enable the cgu-du-upgrade ClusterGroupUpgrade CR to start the platform and the Operator update by running the following command:
  $ oc --namespace=default patch clustergroupupgrade.ran.openshift.io/cgu-du-upgrade \ --patch '{"spec":{"enable":true, "preCaching": false}}' --type=merge
2. Monitor the process. Upon completion, ensure that the policy is compliant by running the following command:
  $ oc get policies --all-namespaces
  Note
  The CRs for the platform and Operator updates can be created from the beginning by configuring the setting to spec.enable: true. In this case, the update starts immediately after pre-caching completes and there is no need to manually enable the CR.
  Both pre-caching and the update create extra resources, such as policies, placement bindings, placement rules, managed cluster actions, and managed cluster view, to help complete the procedures. Setting the afterCompletion.deleteObjects field to true deletes all these resources after the updates complete.

10.3.6. Removing Performance Addon Operator subscriptions from deployed clusters with PolicyGenTemplate CRs
Copy link

In earlier versions of OpenShift Container Platform, the Performance Addon Operator provided automatic, low latency performance tuning for applications. In OpenShift Container Platform 4.11 or later, these functions are part of the Node Tuning Operator.

Do not install the Performance Addon Operator on clusters running OpenShift Container Platform 4.11 or later. If you upgrade to OpenShift Container Platform 4.11 or later, the Node Tuning Operator automatically removes the Performance Addon Operator.

Note

You need to remove any policies that create Performance Addon Operator subscriptions to prevent a re-installation of the Operator.

The reference DU profile includes the Performance Addon Operator in the PolicyGenTemplate CR common-ranGen.yaml. To remove the subscription from deployed managed clusters, you must update common-ranGen.yaml.

Note

If you install Performance Addon Operator 4.10.3-5 or later on OpenShift Container Platform 4.11 or later, the Performance Addon Operator detects the cluster version and automatically hibernates to avoid interfering with the Node Tuning Operator functions. However, to ensure best performance, remove the Performance Addon Operator from your OpenShift Container Platform 4.11 clusters.

Prerequisites

Create a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for ArgoCD.
Update to OpenShift Container Platform 4.11 or later.
Log in as a user with cluster-admin privileges.

Procedure

Change the complianceType to mustnothave for the Performance Addon Operator namespace, Operator group, and subscription in the common-ranGen.yaml file.

- fileName: PaoSubscriptionNS.yaml
  policyName: "subscriptions-policy"
  complianceType: mustnothave
- fileName: PaoSubscriptionOperGroup.yaml
  policyName: "subscriptions-policy"
  complianceType: mustnothave
- fileName: PaoSubscription.yaml
  policyName: "subscriptions-policy"
  complianceType: mustnothave

Merge the changes with your custom site repository and wait for the ArgoCD application to synchronize the change to the hub cluster. The status of the common-subscriptions-policy policy changes to Non-Compliant.
Apply the change to your target clusters by using the Topology Aware Lifecycle Manager. For more information about rolling out configuration changes, see the "Additional resources" section.
Monitor the process. When the status of the common-subscriptions-policy policy for a target cluster is Compliant, the Performance Addon Operator has been removed from the cluster. Get the status of the common-subscriptions-policy by running the following command:
```
$ oc get policy -n ztp-common common-subscriptions-policy
```
Delete the Performance Addon Operator namespace, Operator group and subscription CRs from spec.sourceFiles in the common-ranGen.yaml file.
Merge the changes with your custom site repository and wait for the ArgoCD application to synchronize the change to the hub cluster. The policy remains compliant.

10.3.7. Removing Cluster Logging Operator artifacts with PolicyGenTemplate CRs
Copy link

When you update to Cluster Logging Operator version 6.x from version 5.x, you must add a cleanup policy that removes the old Operator artifacts from deployed clusters after creating the new API custom resources (CRs).

Procedure

Add the group-du-clo5-cleanup-policy.yaml file in the git repository that contains your other PolicyGenTemplate CRs.

Example cleanup files

---
apiVersion: policy.open-cluster-management.io/v1
kind: Policy
metadata:
    annotations:
        policy.open-cluster-management.io/categories: CM Configuration Management
        policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
        policy.open-cluster-management.io/description: ""
        policy.open-cluster-management.io/standards: NIST SP 800-53
        ran.openshift.io/ztp-deploy-wave: "11"
    name: group-du-sno-clo5-cleanup
    namespace: ztp-group
spec:
    disabled: false
    policy-templates:
        - objectDefinition:
            apiVersion: policy.open-cluster-management.io/v1
            kind: ConfigurationPolicy
            metadata:
                name: group-du-sno-clo5-cleanup
            spec:
                evaluationInterval:
                    compliant: 10m
                    noncompliant: 10s
                namespaceSelector:
                    exclude:
                        - kube-*
                    include:
                        - '*'
                object-templates-raw: |
                    {{ if ne (default "" (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "clusterlogforwarders.logging.openshift.io").metadata.name) "" }}
                    - complianceType: mustnothave
                      objectDefinition:
                        apiVersion: logging.openshift.io/v1
                        kind: ClusterLogForwarder
                        metadata:
                          name: instance
                          namespace: openshift-logging
                    - complianceType: mustnothave
                      objectDefinition:
                        apiVersion: apiextensions.k8s.io/v1
                        kind: CustomResourceDefinition
                        metadata:
                          name: clusterlogforwarders.logging.openshift.io
                    {{ end }}
                    {{ if ne (default "" (lookup "apiextensions.k8s.io/v1" "CustomResourceDefinition" "" "clusterloggings.logging.openshift.io").metadata.name) "" }}
                    - complianceType: mustnothave
                      objectDefinition:
                        apiVersion: logging.openshift.io/v1
                        kind: ClusterLogging
                        metadata:
                          name: instance
                          namespace: openshift-logging
                    - complianceType: mustnothave
                      objectDefinition:
                        apiVersion: apiextensions.k8s.io/v1
                        kind: CustomResourceDefinition
                        metadata:
                          name: clusterloggings.logging.openshift.io
                    {{ end }}
                remediationAction: inform
                severity: low
    remediationAction: inform
---
apiVersion: apps.open-cluster-management.io/v1
kind: PlacementRule
metadata:
    name: placementrule-group-du-sno-clo5-cleanup
    namespace: ztp-group
spec:
    clusterSelector:
        matchExpressions:
            - key: group-du-sno
              operator: Exists
            - key: du-profile
              operator: In
              values:
                - latest
            - key: clo5-cleanup-done
              operator: DoesNotExist
---
apiVersion: policy.open-cluster-management.io/v1
kind: PlacementBinding
metadata:
    name: binding-group-du-sno-clo5-cleanup
    namespace: ztp-group
placementRef:
    apiGroup: apps.open-cluster-management.io
    kind: PlacementRule
    name: placementrule-group-du-sno-clo5-cleanup
subjects:
    - apiGroup: policy.open-cluster-management.io
      kind: Policy
      name: group-du-sno-clo5-cleanup

Update the PlacementRule in the cleanup PolicyGenTemplate CR as needed to match binding rules in use for the associated fleet of clusters.
Add the cleanup PolicyGenTemplate CR to the resources section in the kustomization.yaml file.
Example kustomization.yaml file
```
# ...

resources:
- ns.yaml
# ...
- group-du-clo5-cleanup-policy.yaml
# ...
```
Commit the cleanup PolicyGenTemplate CR and kustomization.yaml file and push to their git repository.
Wait for ArgoCD to generate and sync the cleanup policy to the hub cluster.

Verification

Verify that the cleanup policy is bound to all clusters that upgraded from Cluster Logging Operator version 5.x to 6.x.
Verify that the cleanup policy wave is after the group-du-sno configuration policy.

10.3.8. Pre-caching user-specified images with TALM on single-node OpenShift clusters
Copy link

You can pre-cache application-specific workload images on single-node OpenShift clusters before upgrading your applications.

You can specify the configuration options for the pre-caching jobs using the following custom resources (CR):

PreCachingConfig CR
ClusterGroupUpgrade CR

Note

All fields in the PreCachingConfig CR are optional.

Example PreCachingConfig CR

apiVersion: ran.openshift.io/v1alpha1
kind: PreCachingConfig
metadata:
  name: exampleconfig
  namespace: exampleconfig-ns
spec:
  overrides:


    platformImage: quay.io/openshift-release-dev/ocp-release@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e1ef
    operatorsIndexes:
      - registry.example.com:5000/custom-redhat-operators:1.0.0
    operatorsPackagesAndChannels:
      - local-storage-operator: stable
      - ptp-operator: stable
      - sriov-network-operator: stable
  spaceRequired: 30 Gi


  excludePrecachePatterns:


    - aws
    - vsphere
  additionalImages:


    - quay.io/exampleconfig/application1@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e1ef
    - quay.io/exampleconfig/application2@sha256:3d5800123dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47adfaef
    - quay.io/exampleconfig/applicationN@sha256:4fe1334adfafadsf987123adfffdaf1243340adfafdedga0991234afdadfsa09

1: By default, TALM automatically populates the platformImage, operatorsIndexes, and the operatorsPackagesAndChannels fields from the policies of the managed clusters. You can specify values to override the default TALM-derived values for these fields.
2: Specifies the minimum required disk space on the cluster. If unspecified, TALM defines a default value for OpenShift Container Platform images. The disk space field must include an integer value and the storage unit. For example: 40 GiB, 200 MB, 1 TiB.
3: Specifies the images to exclude from pre-caching based on image name matching.
4: Specifies the list of additional images to pre-cache.

Example ClusterGroupUpgrade CR with PreCachingConfig CR reference

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu
spec:
  preCaching: true


  preCachingConfigRef:
    name: exampleconfig


    namespace: exampleconfig-ns

1: The preCaching field set to true enables the pre-caching job.
2: The preCachingConfigRef.name field specifies the PreCachingConfig CR that you want to use.
3: The preCachingConfigRef.namespace specifies the namespace of the PreCachingConfig CR that you want to use.

10.3.8.1. Creating the custom resources for pre-caching
Copy link

You must create the PreCachingConfig CR before or concurrently with the ClusterGroupUpgrade CR.

Create the PreCachingConfig CR with the list of additional images you want to pre-cache.

apiVersion: ran.openshift.io/v1alpha1
kind: PreCachingConfig
metadata:
  name: exampleconfig
  namespace: default


spec:
[...]
  spaceRequired: 30Gi


  additionalImages:
    - quay.io/exampleconfig/application1@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e1ef
    - quay.io/exampleconfig/application2@sha256:3d5800123dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47adfaef
    - quay.io/exampleconfig/applicationN@sha256:4fe1334adfafadsf987123adfffdaf1243340adfafdedga0991234afdadfsa09

1: The namespace must be accessible to the hub cluster.
2: It is recommended to set the minimum disk space required field to ensure that there is sufficient storage space for the pre-cached images.

Create a ClusterGroupUpgrade CR with the preCaching field set to true and specify the PreCachingConfig CR created in the previous step:

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  name: cgu
  namespace: default
spec:
  clusters:
  - sno1
  - sno2
  preCaching: true
  preCachingConfigRef:
  - name: exampleconfig
    namespace: default
  managedPolicies:
    - du-upgrade-platform-upgrade
    - du-upgrade-operator-catsrc-policy
    - common-subscriptions-policy
  remediationStrategy:
    timeout: 240

Warning

Once you install the images on the cluster, you cannot change or delete them.

When you want to start pre-caching the images, apply the ClusterGroupUpgrade CR by running the following command:
```
$ oc apply -f cgu.yaml
```

TALM verifies the ClusterGroupUpgrade CR.

From this point, you can continue with the TALM pre-caching workflow.

Note

All sites are pre-cached concurrently.

Verification

Check the pre-caching status on the hub cluster where the ClusterUpgradeGroup CR is applied by running the following command:

$ oc get cgu <cgu_name> -n <cgu_namespace> -oyaml

Example output

  precaching:
    spec:
      platformImage: quay.io/openshift-release-dev/ocp-release@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e1ef
      operatorsIndexes:
        - registry.example.com:5000/custom-redhat-operators:1.0.0
      operatorsPackagesAndChannels:
        - local-storage-operator: stable
        - ptp-operator: stable
        - sriov-network-operator: stable
      excludePrecachePatterns:
        - aws
        - vsphere
      additionalImages:
        - quay.io/exampleconfig/application1@sha256:3d5800990dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47e2e1ef
        - quay.io/exampleconfig/application2@sha256:3d5800123dee7cd4727d3fe238a97e2d2976d3808fc925ada29c559a47adfaef
        - quay.io/exampleconfig/applicationN@sha256:4fe1334adfafadsf987123adfffdaf1243340adfafdedga0991234afdadfsa09
      spaceRequired: "30"
    status:
      sno1: Starting
      sno2: Starting

The pre-caching configurations are validated by checking if the managed policies exist. Valid configurations of the ClusterGroupUpgrade and the PreCachingConfig CRs result in the following statuses:

Example output of valid CRs

- lastTransitionTime: "2023-01-01T00:00:01Z"
  message: All selected clusters are valid
  reason: ClusterSelectionCompleted
  status: "True"
  type: ClusterSelected
- lastTransitionTime: "2023-01-01T00:00:02Z"
  message: Completed validation
  reason: ValidationCompleted
  status: "True"
  type: Validated
- lastTransitionTime: "2023-01-01T00:00:03Z"
  message: Precaching spec is valid and consistent
  reason: PrecacheSpecIsWellFormed
  status: "True"
  type: PrecacheSpecValid
- lastTransitionTime: "2023-01-01T00:00:04Z"
  message: Precaching in progress for 1 clusters
  reason: InProgress
  status: "False"
  type: PrecachingSucceeded

Example of an invalid PreCachingConfig CR

Type:    "PrecacheSpecValid"
Status:  False,
Reason:  "PrecacheSpecIncomplete"
Message: "Precaching spec is incomplete: failed to get PreCachingConfig resource due to PreCachingConfig.ran.openshift.io "<pre-caching_cr_name>" not found"

You can find the pre-caching job by running the following command on the managed cluster:

$ oc get jobs -n openshift-talo-pre-cache

Example of pre-caching job in progress

NAME        COMPLETIONS       DURATION      AGE
pre-cache   0/1               1s            1s

You can check the status of the pod created for the pre-caching job by running the following command:

$ oc describe pod pre-cache -n openshift-talo-pre-cache

Example of pre-caching job in progress

Type        Reason              Age    From              Message
Normal      SuccesfulCreate     19s    job-controller    Created pod: pre-cache-abcd1

You can get live updates on the status of the job by running the following command:
```
$ oc logs -f pre-cache-abcd1 -n openshift-talo-pre-cache
```

To verify the pre-cache job is successfully completed, run the following command:

$ oc describe pod pre-cache -n openshift-talo-pre-cache

Example of completed pre-cache job

Type        Reason              Age    From              Message
Normal      SuccesfulCreate     5m19s  job-controller    Created pod: pre-cache-abcd1
Normal      Completed           19s    job-controller    Job completed

To verify that the images are successfully pre-cached on the single-node OpenShift, do the following:
1. Enter into the node in debug mode:
  $ oc debug node/cnfdf00.example.lab
2. Change root to host:
  $ chroot /host/
3. Search for the desired images:
  $ sudo podman images | grep <operator_name>

10.3.9. About the auto-created ClusterGroupUpgrade CR for GitOps ZTP
Copy link

TALM has a controller called ManagedClusterForCGU that monitors the Ready state of the ManagedCluster CRs on the hub cluster and creates the ClusterGroupUpgrade CRs for GitOps Zero Touch Provisioning (ZTP).

For any managed cluster in the Ready state without a ztp-done label applied, the ManagedClusterForCGU controller automatically creates a ClusterGroupUpgrade CR in the ztp-install namespace with its associated RHACM policies that are created during the GitOps ZTP process. TALM then remediates the set of configuration policies that are listed in the auto-created ClusterGroupUpgrade CR to push the configuration CRs to the managed cluster.

If there are no policies for the managed cluster at the time when the cluster becomes Ready, a ClusterGroupUpgrade CR with no policies is created. Upon completion of the ClusterGroupUpgrade the managed cluster is labeled as ztp-done. If there are policies that you want to apply for that managed cluster, manually create a ClusterGroupUpgrade as a day-2 operation.

Example of an auto-created ClusterGroupUpgrade CR for GitOps ZTP

apiVersion: ran.openshift.io/v1alpha1
kind: ClusterGroupUpgrade
metadata:
  generation: 1
  name: spoke1
  namespace: ztp-install
  ownerReferences:
  - apiVersion: cluster.open-cluster-management.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: ManagedCluster
    name: spoke1
    uid: 98fdb9b2-51ee-4ee7-8f57-a84f7f35b9d5
  resourceVersion: "46666836"
  uid: b8be9cd2-764f-4a62-87d6-6b767852c7da
spec:
  actions:
    afterCompletion:
      addClusterLabels:
        ztp-done: ""


      deleteClusterLabels:
        ztp-running: ""
      deleteObjects: true
    beforeEnable:
      addClusterLabels:
        ztp-running: ""


  clusters:
  - spoke1
  enable: true
  managedPolicies:
  - common-spoke1-config-policy
  - common-spoke1-subscriptions-policy
  - group-spoke1-config-policy
  - spoke1-config-policy
  - group-spoke1-validator-du-policy
  preCaching: false
  remediationStrategy:
    maxConcurrency: 1
    timeout: 240

1: Applied to the managed cluster when TALM completes the cluster configuration.
2: Applied to the managed cluster when TALM starts deploying the configuration policies.

Chapter 10. Managing cluster policies with PolicyGenTemplate resources

10.1. Configuring managed cluster policies by using PolicyGenTemplate resourcesCopy linkLink copied to clipboard!

10.1.1. About the PolicyGenTemplate CRDCopy linkLink copied to clipboard!

10.1.2. Recommendations when customizing PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.1.3. PolicyGenTemplate CRs for RAN deploymentsCopy linkLink copied to clipboard!

10.1.4. Customizing a managed cluster with PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.1.5. Monitoring managed cluster policy deployment progressCopy linkLink copied to clipboard!

10.1.6. Validating the generation of configuration policy CRsCopy linkLink copied to clipboard!

10.1.7. Restarting policy reconciliationCopy linkLink copied to clipboard!

10.1.8. Changing applied managed cluster CRs using policiesCopy linkLink copied to clipboard!

10.1.9. Indication of done for GitOps ZTP installationsCopy linkLink copied to clipboard!

10.2. Advanced managed cluster configuration with PolicyGenTemplate resourcesCopy linkLink copied to clipboard!

10.2.1. Deploying additional changes to clustersCopy linkLink copied to clipboard!

10.2.2. Using PolicyGenTemplate CRs to override source CRs contentCopy linkLink copied to clipboard!

10.2.3. Adding custom content to the GitOps ZTP pipelineCopy linkLink copied to clipboard!

10.2.4. Configuring policy compliance evaluation timeouts for PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.2.5. Signalling GitOps ZTP cluster deployment completion with validator inform policiesCopy linkLink copied to clipboard!

10.2.6. Configuring power states using PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.2.6.1. Configuring performance mode using PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.2.6.2. Configuring high-performance mode using PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.2.6.3. Configuring power saving mode using PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.2.6.4. Maximizing power savingsCopy linkLink copied to clipboard!

10.2.7. Configuring LVM Storage using PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.2.8. Configuring PTP events with PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.2.8.1. Configuring PTP events that use HTTP transportCopy linkLink copied to clipboard!

10.2.9. Configuring the Image Registry Operator for local caching of imagesCopy linkLink copied to clipboard!

10.2.9.1. Configuring disk partitioning with SiteConfigCopy linkLink copied to clipboard!

10.2.9.2. Configuring the image registry using PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.3. Updating managed clusters in a disconnected environment with PolicyGenTemplate resources and TALMCopy linkLink copied to clipboard!

10.3.1. Setting up the disconnected environmentCopy linkLink copied to clipboard!

10.3.2. Performing a platform update with PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.3.3. Performing an Operator update with PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.3.4. Troubleshooting missed Operator updates with PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.3.5. Performing a platform and an Operator update togetherCopy linkLink copied to clipboard!

10.3.6. Removing Performance Addon Operator subscriptions from deployed clusters with PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.3.7. Removing Cluster Logging Operator artifacts with PolicyGenTemplate CRsCopy linkLink copied to clipboard!

10.3.8. Pre-caching user-specified images with TALM on single-node OpenShift clustersCopy linkLink copied to clipboard!

10.3.8.1. Creating the custom resources for pre-cachingCopy linkLink copied to clipboard!

10.3.9. About the auto-created ClusterGroupUpgrade CR for GitOps ZTPCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.1. Configuring managed cluster policies by using PolicyGenTemplate resources
Copy link

10.1.1. About the PolicyGenTemplate CRD
Copy link

10.1.2. Recommendations when customizing PolicyGenTemplate CRs
Copy link

10.1.3. PolicyGenTemplate CRs for RAN deployments
Copy link

10.1.4. Customizing a managed cluster with PolicyGenTemplate CRs
Copy link

10.1.5. Monitoring managed cluster policy deployment progress
Copy link

10.1.6. Validating the generation of configuration policy CRs
Copy link

10.1.7. Restarting policy reconciliation
Copy link

10.1.8. Changing applied managed cluster CRs using policies
Copy link

10.1.9. Indication of done for GitOps ZTP installations
Copy link

10.2. Advanced managed cluster configuration with PolicyGenTemplate resources
Copy link

10.2.1. Deploying additional changes to clusters
Copy link

10.2.2. Using PolicyGenTemplate CRs to override source CRs content
Copy link

10.2.3. Adding custom content to the GitOps ZTP pipeline
Copy link

10.2.4. Configuring policy compliance evaluation timeouts for PolicyGenTemplate CRs
Copy link

10.2.5. Signalling GitOps ZTP cluster deployment completion with validator inform policies
Copy link

10.2.6. Configuring power states using PolicyGenTemplate CRs
Copy link

10.2.6.1. Configuring performance mode using PolicyGenTemplate CRs
Copy link

10.2.6.2. Configuring high-performance mode using PolicyGenTemplate CRs
Copy link

10.2.6.3. Configuring power saving mode using PolicyGenTemplate CRs
Copy link

10.2.6.4. Maximizing power savings
Copy link

10.2.7. Configuring LVM Storage using PolicyGenTemplate CRs
Copy link

10.2.8. Configuring PTP events with PolicyGenTemplate CRs
Copy link

10.2.8.1. Configuring PTP events that use HTTP transport
Copy link

10.2.9. Configuring the Image Registry Operator for local caching of images
Copy link

10.2.9.1. Configuring disk partitioning with SiteConfig
Copy link

10.2.9.2. Configuring the image registry using PolicyGenTemplate CRs
Copy link

10.3. Updating managed clusters in a disconnected environment with PolicyGenTemplate resources and TALM
Copy link

10.3.1. Setting up the disconnected environment
Copy link

10.3.2. Performing a platform update with PolicyGenTemplate CRs
Copy link

10.3.3. Performing an Operator update with PolicyGenTemplate CRs
Copy link

10.3.4. Troubleshooting missed Operator updates with PolicyGenTemplate CRs
Copy link

10.3.5. Performing a platform and an Operator update together
Copy link

10.3.6. Removing Performance Addon Operator subscriptions from deployed clusters with PolicyGenTemplate CRs
Copy link

10.3.7. Removing Cluster Logging Operator artifacts with PolicyGenTemplate CRs
Copy link

10.3.8. Pre-caching user-specified images with TALM on single-node OpenShift clusters
Copy link

10.3.8.1. Creating the custom resources for pre-caching
Copy link

10.3.9. About the auto-created ClusterGroupUpgrade CR for GitOps ZTP
Copy link