Search

Chapter 13. Expanding single-node OpenShift clusters with GitOps ZTP

download PDF

You can expand single-node OpenShift clusters with GitOps Zero Touch Provisioning (ZTP). When you add worker nodes to single-node OpenShift clusters, the original single-node OpenShift cluster retains the control plane node role. Adding worker nodes does not require any downtime for the existing single-node OpenShift cluster.

Note

Although there is no specified limit on the number of worker nodes that you can add to a single-node OpenShift cluster, you must revaluate the reserved CPU allocation on the control plane node for the additional worker nodes.

If you require workload partitioning on the worker node, you must deploy and remediate the managed cluster policies on the hub cluster before installing the node. This way, the workload partitioning MachineConfig objects are rendered and associated with the worker machine config pool before the GitOps ZTP workflow applies the MachineConfig ignition file to the worker node.

It is recommended that you first remediate the policies, and then install the worker node. If you create the workload partitioning manifests after installing the worker node, you must drain the node manually and delete all the pods managed by daemon sets. When the managing daemon sets create the new pods, the new pods undergo the workload partitioning process.

Important

Adding worker nodes to single-node OpenShift clusters with GitOps ZTP is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Additional resources

13.1. Applying profiles to the worker node with PolicyGenerator or PolicyGenTemplate resources

You can configure the additional worker node with a DU profile.

You can apply a RAN distributed unit (DU) profile to the worker node cluster using the GitOps Zero Touch Provisioning (ZTP) common, group, and site-specific PolicyGenerator or PolicyGenTemplate resources. The GitOps ZTP pipeline that is linked to the ArgoCD policies application includes the following CRs that you can find in the relevant out/argocd/example folder when you extract the ztp-site-generate container:

/acmpolicygenerator resources
  • acm-common-ranGen.yaml
  • acm-group-du-sno-ranGen.yaml
  • acm-example-sno-site.yaml
  • ns.yaml
  • kustomization.yaml
/policygentemplates resources
  • common-ranGen.yaml
  • group-du-sno-ranGen.yaml
  • example-sno-site.yaml
  • ns.yaml
  • kustomization.yaml

Configuring the DU profile on the worker node is considered an upgrade. To initiate the upgrade flow, you must update the existing policies or create additional ones. Then, you must create a ClusterGroupUpgrade CR to reconcile the policies in the group of clusters.

13.2. Ensuring PTP and SR-IOV daemon selector compatibility

If the DU profile was deployed using the GitOps Zero Touch Provisioning (ZTP) plugin version 4.11 or earlier, the PTP and SR-IOV Operators might be configured to place the daemons only on nodes labeled as master. This configuration prevents the PTP and SR-IOV daemons from operating on the worker node. If the PTP and SR-IOV daemon node selectors are incorrectly configured on your system, you must change the daemons before proceeding with the worker DU profile configuration.

Procedure

  1. Check the daemon node selector settings of the PTP Operator on one of the spoke clusters:

    $ oc get ptpoperatorconfig/default -n openshift-ptp -ojsonpath='{.spec}' | jq

    Example output for PTP Operator

    {"daemonNodeSelector":{"node-role.kubernetes.io/master":""}} 1

    1
    If the node selector is set to master, the spoke was deployed with the version of the GitOps ZTP plugin that requires changes.
  2. Check the daemon node selector settings of the SR-IOV Operator on one of the spoke clusters:

    $  oc get sriovoperatorconfig/default -n \
    openshift-sriov-network-operator -ojsonpath='{.spec}' | jq

    Example output for SR-IOV Operator

    {"configDaemonNodeSelector":{"node-role.kubernetes.io/worker":""},"disableDrain":false,"enableInjector":true,"enableOperatorWebhook":true} 1

    1
    If the node selector is set to master, the spoke was deployed with the version of the GitOps ZTP plugin that requires changes.
  3. In the group policy, add the following complianceType and spec entries:

    spec:
        - fileName: PtpOperatorConfig.yaml
          policyName: "config-policy"
          complianceType: mustonlyhave
          spec:
            daemonNodeSelector:
              node-role.kubernetes.io/worker: ""
        - fileName: SriovOperatorConfig.yaml
          policyName: "config-policy"
          complianceType: mustonlyhave
          spec:
            configDaemonNodeSelector:
              node-role.kubernetes.io/worker: ""
    Important

    Changing the daemonNodeSelector field causes temporary PTP synchronization loss and SR-IOV connectivity loss.

  4. Commit the changes in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.

13.3. PTP and SR-IOV node selector compatibility

The PTP configuration resources and SR-IOV network node policies use node-role.kubernetes.io/master: "" as the node selector. If the additional worker nodes have the same NIC configuration as the control plane node, the policies used to configure the control plane node can be reused for the worker nodes. However, the node selector must be changed to select both node types, for example with the "node-role.kubernetes.io/worker" label.

13.4. Using PolicyGenerator CRs to apply worker node policies to worker nodes

You can create policies for worker nodes using PolicyGenerator CRs.

Procedure

  1. Create the following PolicyGenerator CR:

    apiVersion: policy.open-cluster-management.io/v1
    kind: PolicyGenerator
    metadata:
        name: example-sno-workers
    placementBindingDefaults:
        name: example-sno-workers-placement-binding
    policyDefaults:
        namespace: example-sno
        placement:
            labelSelector:
                matchExpressions:
                    - key: sites
                      operator: In
                      values:
                        - example-sno 1
        remediationAction: inform
        severity: low
        namespaceSelector:
            exclude:
                - kube-*
            include:
                - '*'
        evaluationInterval:
            compliant: 10m
            noncompliant: 10s
    policies:
        - name: example-sno-workers-config-policy
          policyAnnotations:
            ran.openshift.io/ztp-deploy-wave: "10"
          manifests:
            - path: source-crs/MachineConfigGeneric.yaml 2
              patches:
                - metadata:
                    labels:
                        machineconfiguration.openshift.io/role: worker 3
                    name: enable-workload-partitioning
                  spec:
                    config:
                        storage:
                            files:
                                - contents:
                                    source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS53b3JrbG9hZHMubWFuYWdlbWVudF0KYWN0aXZhdGlvbl9hbm5vdGF0aW9uID0gInRhcmdldC53b3JrbG9hZC5vcGVuc2hpZnQuaW8vbWFuYWdlbWVudCIKYW5ub3RhdGlvbl9wcmVmaXggPSAicmVzb3VyY2VzLndvcmtsb2FkLm9wZW5zaGlmdC5pbyIKcmVzb3VyY2VzID0geyAiY3B1c2hhcmVzIiA9IDAsICJjcHVzZXQiID0gIjAtMyIgfQo=
                                  mode: 420
                                  overwrite: true
                                  path: /etc/crio/crio.conf.d/01-workload-partitioning
                                  user:
                                    name: root
                                - contents:
                                    source: data:text/plain;charset=utf-8;base64,ewogICJtYW5hZ2VtZW50IjogewogICAgImNwdXNldCI6ICIwLTMiCiAgfQp9Cg==
                                  mode: 420
                                  overwrite: true
                                  path: /etc/kubernetes/openshift-workload-pinning
                                  user:
                                    name: root
            - path: source-crs/PerformanceProfile-MCP-worker.yaml
              patches:
                - metadata:
                    name: openshift-worker-node-performance-profile
                  spec:
                    cpu: 4
                        isolated: 4-47
                        reserved: 0-3
                    hugepages:
                        defaultHugepagesSize: 1G
                        pages:
                            - count: 32
                              size: 1G
                    realTimeKernel:
                        enabled: true
            - path: source-crs/TunedPerformancePatch-MCP-worker.yaml
              patches:
                - metadata:
                    name: performance-patch-worker
                  spec:
                    profile:
                        - data: |
                          [main]
                          summary=Configuration changes profile inherited from performance created tuned
                          include=openshift-node-performance-openshift-worker-node-performance-profile
                          [bootloader]
                          cmdline_crash=nohz_full=4-47 5
                          [sysctl]
                          kernel.timer_migration=1
                          [scheduler]
                          group.ice-ptp=0:f:10:*:ice-ptp.*
                          [service]
                          service.stalld=start,enable
                          service.chronyd=stop,disable
                          name: performance-patch-worker
                    recommend:
                        - profile: performance-patch-worker
    1
    The policies are applied to all clusters with this label.
    2
    This generic MachineConfig CR is used to configure workload partitioning on the worker node.
    3
    The MCP field must be set to worker.
    4
    The cpu.isolated and cpu.reserved fields must be configured for each particular hardware platform.
    5
    The cmdline_crash CPU set must match the cpu.isolated set in the PerformanceProfile section.

    A generic MachineConfig CR is used to configure workload partitioning on the worker node. You can generate the content of crio and kubelet configuration files.

  2. Add the created policy template to the Git repository monitored by the ArgoCD policies application.
  3. Add the policy in the kustomization.yaml file.
  4. Commit the changes in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.
  5. To remediate the new policies to your spoke cluster, create a TALM custom resource:

    $ cat <<EOF | oc apply -f -
    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: example-sno-worker-policies
      namespace: default
    spec:
      backup: false
      clusters:
      - example-sno
      enable: true
      managedPolicies:
      - group-du-sno-config-policy
      - example-sno-workers-config-policy
      - example-sno-config-policy
      preCaching: false
      remediationStrategy:
        maxConcurrency: 1
    EOF

13.5. Using PolicyGenTemplate CRs to apply worker node policies to worker nodes

You can create policies for worker nodes using PolicyGenTemplate CRs.

Procedure

  1. Create the following PolicyGenTemplate CR:

    apiVersion: ran.openshift.io/v1
    kind: PolicyGenTemplate
    metadata:
      name: "example-sno-workers"
      namespace: "example-sno"
    spec:
      bindingRules:
        sites: "example-sno" 1
      mcp: "worker" 2
      sourceFiles:
        - fileName: MachineConfigGeneric.yaml 3
          policyName: "config-policy"
          metadata:
            labels:
              machineconfiguration.openshift.io/role: worker
            name: enable-workload-partitioning
          spec:
            config:
              storage:
                files:
                - contents:
                    source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS53b3JrbG9hZHMubWFuYWdlbWVudF0KYWN0aXZhdGlvbl9hbm5vdGF0aW9uID0gInRhcmdldC53b3JrbG9hZC5vcGVuc2hpZnQuaW8vbWFuYWdlbWVudCIKYW5ub3RhdGlvbl9wcmVmaXggPSAicmVzb3VyY2VzLndvcmtsb2FkLm9wZW5zaGlmdC5pbyIKcmVzb3VyY2VzID0geyAiY3B1c2hhcmVzIiA9IDAsICJjcHVzZXQiID0gIjAtMyIgfQo=
                  mode: 420
                  overwrite: true
                  path: /etc/crio/crio.conf.d/01-workload-partitioning
                  user:
                    name: root
                - contents:
                    source: data:text/plain;charset=utf-8;base64,ewogICJtYW5hZ2VtZW50IjogewogICAgImNwdXNldCI6ICIwLTMiCiAgfQp9Cg==
                  mode: 420
                  overwrite: true
                  path: /etc/kubernetes/openshift-workload-pinning
                  user:
                    name: root
        - fileName: PerformanceProfile.yaml
          policyName: "config-policy"
          metadata:
            name: openshift-worker-node-performance-profile
          spec:
            cpu: 4
              isolated: "4-47"
              reserved: "0-3"
            hugepages:
              defaultHugepagesSize: 1G
              pages:
                - size: 1G
                  count: 32
            realTimeKernel:
              enabled: true
        - fileName: TunedPerformancePatch.yaml
          policyName: "config-policy"
          metadata:
            name: performance-patch-worker
          spec:
            profile:
              - name: performance-patch-worker
                data: |
                  [main]
                  summary=Configuration changes profile inherited from performance created tuned
                  include=openshift-node-performance-openshift-worker-node-performance-profile
                  [bootloader]
                  cmdline_crash=nohz_full=4-47 5
                  [sysctl]
                  kernel.timer_migration=1
                  [scheduler]
                  group.ice-ptp=0:f:10:*:ice-ptp.*
                  [service]
                  service.stalld=start,enable
                  service.chronyd=stop,disable
            recommend:
            - profile: performance-patch-worker
    1
    The policies are applied to all clusters with this label.
    2
    The MCP field must be set to worker.
    3
    This generic MachineConfig CR is used to configure workload partitioning on the worker node.
    4
    The cpu.isolated and cpu.reserved fields must be configured for each particular hardware platform.
    5
    The cmdline_crash CPU set must match the cpu.isolated set in the PerformanceProfile section.

    A generic MachineConfig CR is used to configure workload partitioning on the worker node. You can generate the content of crio and kubelet configuration files.

  2. Add the created policy template to the Git repository monitored by the ArgoCD policies application.
  3. Add the policy in the kustomization.yaml file.
  4. Commit the changes in Git, and then push to the Git repository being monitored by the GitOps ZTP ArgoCD application.
  5. To remediate the new policies to your spoke cluster, create a TALM custom resource:

    $ cat <<EOF | oc apply -f -
    apiVersion: ran.openshift.io/v1alpha1
    kind: ClusterGroupUpgrade
    metadata:
      name: example-sno-worker-policies
      namespace: default
    spec:
      backup: false
      clusters:
      - example-sno
      enable: true
      managedPolicies:
      - group-du-sno-config-policy
      - example-sno-workers-config-policy
      - example-sno-config-policy
      preCaching: false
      remediationStrategy:
        maxConcurrency: 1
    EOF

13.6. Adding worker nodes to single-node OpenShift clusters with GitOps ZTP

You can add one or more worker nodes to existing single-node OpenShift clusters to increase available CPU resources in the cluster.

Prerequisites

  • Install and configure RHACM 2.6 or later in an OpenShift Container Platform 4.11 or later bare-metal hub cluster
  • Install Topology Aware Lifecycle Manager in the hub cluster
  • Install Red Hat OpenShift GitOps in the hub cluster
  • Use the GitOps ZTP ztp-site-generate container image version 4.12 or later
  • Deploy a managed single-node OpenShift cluster with GitOps ZTP
  • Configure the Central Infrastructure Management as described in the RHACM documentation
  • Configure the DNS serving the cluster to resolve the internal API endpoint api-int.<cluster_name>.<base_domain>

Procedure

  1. If you deployed your cluster by using the example-sno.yaml SiteConfig manifest, add your new worker node to the spec.clusters['example-sno'].nodes list:

    nodes:
    - hostName: "example-node2.example.com"
      role: "worker"
      bmcAddress: "idrac-virtualmedia+https://[1111:2222:3333:4444::bbbb:1]/redfish/v1/Systems/System.Embedded.1"
      bmcCredentialsName:
        name: "example-node2-bmh-secret"
      bootMACAddress: "AA:BB:CC:DD:EE:11"
      bootMode: "UEFI"
      nodeNetwork:
        interfaces:
          - name: eno1
            macAddress: "AA:BB:CC:DD:EE:11"
        config:
          interfaces:
            - name: eno1
              type: ethernet
              state: up
              macAddress: "AA:BB:CC:DD:EE:11"
              ipv4:
                enabled: false
              ipv6:
                enabled: true
                address:
                - ip: 1111:2222:3333:4444::1
                  prefix-length: 64
          dns-resolver:
            config:
              search:
              - example.com
              server:
              - 1111:2222:3333:4444::2
          routes:
            config:
            - destination: ::/0
              next-hop-interface: eno1
              next-hop-address: 1111:2222:3333:4444::1
              table-id: 254
  2. Create a BMC authentication secret for the new host, as referenced by the bmcCredentialsName field in the spec.nodes section of your SiteConfig file:

    apiVersion: v1
    data:
      password: "password"
      username: "username"
    kind: Secret
    metadata:
      name: "example-node2-bmh-secret"
      namespace: example-sno
    type: Opaque
  3. Commit the changes in Git, and then push to the Git repository that is being monitored by the GitOps ZTP ArgoCD application.

    When the ArgoCD cluster application synchronizes, two new manifests appear on the hub cluster generated by the GitOps ZTP plugin:

    • BareMetalHost
    • NMStateConfig

      Important

      The cpuset field should not be configured for the worker node. Workload partitioning for worker nodes is added through management policies after the node installation is complete.

Verification

You can monitor the installation process in several ways.

  • Check if the preprovisioning images are created by running the following command:

    $ oc get ppimg -n example-sno

    Example output

    NAMESPACE       NAME            READY   REASON
    example-sno     example-sno     True    ImageCreated
    example-sno     example-node2   True    ImageCreated

  • Check the state of the bare-metal hosts:

    $ oc get bmh -n example-sno

    Example output

    NAME            STATE          CONSUMER   ONLINE   ERROR   AGE
    example-sno     provisioned               true             69m
    example-node2   provisioning              true             4m50s 1

    1
    The provisioning state indicates that node booting from the installation media is in progress.
  • Continuously monitor the installation process:

    1. Watch the agent install process by running the following command:

      $ oc get agent -n example-sno --watch

      Example output

      NAME                                   CLUSTER   APPROVED   ROLE     STAGE
      671bc05d-5358-8940-ec12-d9ad22804faa   example-sno   true       master   Done
      [...]
      14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Starting installation
      14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Installing
      14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Writing image to disk
      [...]
      14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Waiting for control plane
      [...]
      14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Rebooting
      14fd821b-a35d-9cba-7978-00ddf535ff37   example-sno   true       worker   Done

    2. When the worker node installation is finished, the worker node certificates are approved automatically. At this point, the worker appears in the ManagedClusterInfo status. Run the following command to see the status:

      $ oc get managedclusterinfo/example-sno -n example-sno -o \
      jsonpath='{range .status.nodeList[*]}{.name}{"\t"}{.conditions}{"\t"}{.labels}{"\n"}{end}'

      Example output

      example-sno	[{"status":"True","type":"Ready"}]	{"node-role.kubernetes.io/master":"","node-role.kubernetes.io/worker":""}
      example-node2	[{"status":"True","type":"Ready"}]	{"node-role.kubernetes.io/worker":""}

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.