Questo contenuto non è disponibile nella lingua selezionata.
Chapter 1. Hosted control planes release notes
With this release, hosted control planes for OpenShift Container Platform 4.22 is available. Hosted control planes for OpenShift Container Platform 4.22 supports multicluster engine for Kubernetes Operator version 2.17.
1.1. New features and enhancements Copia collegamentoCollegamento copiato negli appunti!
This release adds improvements related to the following components and concepts:
- Monitor connectivity from the data plane to the control plane
-
In this release, you can monitor connectivity from the data plane to the control plane by using the
ControlPlaneConnectionAvailablecondition. For more information, see Connectivity monitoring from the data plane to the control plane. - Implement network segmentation for hosted clusters
- In this release, you can configure network isolation for hosted clusters with container-based isolation, VM-based isolation, or physical isolation. For more information, see Network isolation for hosted clusters.
- Enable Amazon Spot Instance support
- In this release, you can enable Amazon Spot Instance support for compute nodes to reduce cloud infrastructure costs. Amazon Spot Instances are suitable for hosted cluster workloads that are fault-tolerant, stateless, and flexible. For more information, see Amazon Spot Instance support for node pools.
- Back up etcd data for hosted control planes by using the etcd snapshot method (Technology Preview)
- As an alternative for the default volume snapshot approach, you can use the etcd snapshot approach to back up and restore etcd data for hosted control planes. The etcd snapshot method is a Technology Preview feature. For more information, see Backing up etcd data for hosted control planes by using the etcd snapshot method.
- Deploy self-managed hosted control planes on Microsoft Azure (Technology Preview)
- In this release, you can create public or private hosted clusters on Azure as a Technology Preview feature. For more information, see Deploying hosted control planes on Azure.
1.2. Fixed issues Copia collegamentoCollegamento copiato negli appunti!
The following issues are fixed for this release:
-
Before this update, services in the hosted control plane namespace, such as the
aws-ebs-csi-driver-controller-metricsservice, used theservice-caannotation (service.beta.openshift.io/serving-cert-secret-name) to generate TLS certificates. As a consequence, control plane services incorrectly depended on the OpenShift Service CA Operator in the hosted cluster for certificate generation, which weakened the security boundary between the control plane and the hosted cluster. With this release, the Control Plane Operator creates and manages TLS certificates for theaws-ebs-csi-driver-controller-metricsservice directly, signed by the hosted control plane root CA, eliminating the dependency on the OpenShift Service CA Operator. The implementation checks forservice-caannotations to ensure a smooth upgrade path from older deployments. As a result, control plane isolation and certificate lifecycle management are improved. (OCPBUGS-34662) -
Before this update, the HyperShift Operator metrics collector validated the proxy CA bundle certificates on every metrics collection cycle. As a consequence, when a certificate in the CA bundle expired, repeated
proxy ca bundle is invalidmessages were posted in the HyperShift Operator logs without identifying the hosted cluster, making it difficult to diagnose the cluster with the invalid proxy CA certificate. With this release, certificate validation is moved to theHostedClusterreconcile loop, and a newValidProxyConfigurationcondition is added to theHostedClusterAPI. The metrics collector now reads the validation result from the condition instead of directly performing validation. As a result, the metrics collector no longer posts repeated messages in the logs, and affected clusters can be identified. (OCPBUGS-55151) - Before this update, KubeVirt virtual machines (VMs) used in node pools were not configured with an external eviction strategy. As a consequence, the Cluster API Provider for KubeVirt controller did not detect eviction requests during node drains on the underlying infrastructure cluster. Node drains were not coordinated properly, the hosted control plane function was disrupted, and pods failed when node pool VMs were shut down. With this release, KubeVirt VMs are configured with the external eviction strategy in the VM template specification. As a result, the Cluster API Provider for KubeVirt can detect eviction events and coordinate the draining of hosted cluster nodes during infrastructure operations. For VMs that support live migration, the Cluster API Provider for KubeVirt skips the drain process and allows the VMs to be migrated without disruption. (OCPBUGS-58397)
-
Before this update, when you removed the
additionalTrustBundlefield from theHostedClusterspecification, theadditionalTrustBundlecertificate was not removed from theuser-ca-bundleconfig map. As a consequence, it appeared that theadditionalTrustBundlecertificate was not removed from the hosted clusters. With this release, the reconciliation logic ensures that theuser-ca-bundleconfig map is deleted from the hosted cluster when you delete theadditionalTrustBundlefield. As a result, when you delete theadditionalTrustBundlefield from theHostedClusterspecification, the certificate is removed, improving security and consistency. (OCPBUGS-60707) -
Before this update, the control plane deployments related to Cluster API (
cluster-apiandcapi-provider) in the hosted control plane namespace lacked finalizers. As a consequence, if these deployments were deleted before theHostedClusterresource was deleted, the controller pods would stop running before they could process finalizers on their managed Cluster API resources (Machineobjects,MachineDeploymentobjects, platform-specific infrastructure objects), leading to orphaned cloud resources such as EC2 instances, VMs, disks, and load balancers. With this update, the HyperShift Operator adds a finalizer,hypershift.openshift.io/component-finalizer, to thecluster-apiandcapi-providerdeployments. The finalizer is only removed after the underlying infrastructure resources have been deleted duringHostedClusterteardown. As a result, accidental deletion of these deployments is blocked until Cluster API resources are properly cleaned up, preventing orphaned cloud resources.(OCPBUGS-63452) -
Before this update, when the
ValidAWSIdentityProvidercondition was copied from the control plane to the hosted cluster, the logic preserved the earlier status if the new condition wasUnknown. As a consequence, when the earlier condition wasTrueand the new condition wasUnknown, the update was skipped. With this release, the condition on the hosted cluster correctly reflects the current health of the AWS Identity Provider referenced in the cloud credentials. (OCPBUGS-66325) - Before this update, the Cluster Network Operator failed to recognize KubeVirt as a supported platform for hosted control planes with dual-stack networking. As a consequence, on deployments of hosted control planes on OpenShift Virtualization with dual-stack networking, the Cluster Network Operator deployment failed. With this release, the Cluster Network Operator recognizes KubeVirt as a supported platform for hosted control planes with dual-stack networking. As a result, deploying hosted control planes on OpenShift Virtualization with IPv4/IPv6 dual-stack networking succeeds. (OCPBUGS-66417)
-
Before this update, the cluster autoscaler did not include the
hypershift.openshift.io/nodepool-globalps-enabledlabel in its--balancing-ignore-labellist. As a consequence, when the autoscaler balanced node groups, it treated nodes with and without this label as belonging to different groups, causing uneven scaling across nodes in the sameNodePoolobject. With this update, thehypershift.openshift.io/nodepool-globalps-enabledlabel is added to the balancing ignore list of the autoscaler. As a result, the autoscaler distributes new nodes evenly across node groups regardless of the Global Pull Secret eligibility label. (OCPBUGS-73817) -
Before this update, when you created a hosted cluster that used a
NodePortpublishing strategy, specifying a port outside the Kubernetes service node port range, such as10000, was silently accepted during cluster creation. As a consequence, the cluster installation got stuck with only 3 pods in the hosted cluster namespace, because the Control Plane Operator rejected the port for being outside the acceptable range of30000-32767, causing a late failure after resources were already provisioned. With this release, early validation is added for theNodePort.Portvalue against the configuredServiceNodePortRangeparameter of the cluster. Invalid values are rejected upfront with a clear message indicating the allowed range. As a result, you receive an immediate validation error when you specify aNodePortthat is outside the acceptable range, and avoid stuck cluster installations. (OCPBUGS-65842) -
Before this update, the
hypershift.openshift.io/nodepool-globalps-enabledlabel was applied to nodes by the Hosted Cluster Config Operatorglobalpscontroller, which discovered eligible nodes by queryingMachineSetobjects andMachineobjects during its periodic reconciliation. As a consequence, when a newReplacenode joined the cluster, theglobal-pull-secret-syncerDaemonSet pod could not schedule on it until the next reconcile cycle of theglobalpscontroller, causing a delay of up to 15 minutes. With this update, the label is set directly on Cluster APIMachineobjects by the HyperShift Operator duringMachineDeploymentreconciliation, so it propagates to nodes at creation time by using the Hosted Cluster Config Operator Node controller. As a result, newReplacenodes on AWS are immediately eligible for theglobal-pull-secret-syncerDaemonSet, eliminating the scheduling delay. (OCPBUGS-77966) -
Before this update, the ignition server deployment computed registry overrides by performing live HTTP registry connectivity checks (
LookupMappedImage/GetMetadata) during every Control Plane Operator reconciliation. As a consequence, network conditions caused the--registry-overridesargument andMIRRORED_RELEASE_IMAGEenvironment variable to return different values on each reconciliation, triggering constant deployment regenerations and pod restarts. With this update, the ignition server deployment uses the static registry overrides from theHostedClusterspecification instead of performing live registry lookups at deploy time. The ignition server already resolves per-image mirrors at runtime by using its own override logic. As a result, ignition server deployments remain stable with consistent configuration, eliminating unnecessary pod restarts. (OCPBUGS-60185) -
Before this update, when the
allowedCIDRBlocksparameter was removed from theHostedClusterspecification, theLoadBalancerSourceRangesfield on the external routerLoadBalancerservice was not cleared. As a consequence, stale Classless Inter-Domain Routing (CIDR) restrictions remained on the router service after the administrator removed the access restrictions, continuing to block traffic that should have been allowed. With this update, the reconciliation logic always sets theLoadBalancerSourceRangesfield on the external router service to match the currentallowedCIDRBlocksvalue, including clearing it when the list is empty. As a result, removing theallowedCIDRBlocksparameter from theHostedClusterspecification correctly removes the CIDR restrictions from the router service.(OCPBUGS-69761) -
Before this update, the
HostedControlPlanecontroller set theHostedControlPlaneAvailablecondition toTrueafter the Kubernetes API server was reachable, without verifying that all control plane components had finished rolling out. As a consequence, customers could interact with the cluster before components such as thekube-controller-manager,oauth-server, orkube-schedulerwere fully ready, which could lead to failures or unexpected behavior. With this update, the controller now lists all control plane component resources in the hosted control plane namespace and verifies that each has itsAvailablecondition set toTruebefore setting theHostedControlPlaneAvailablecondition to True. If any components are not yet available, the condition reports theComponentsNotAvailablereason with a message listing the pending components. After the cluster reaches the available state, later component rollouts, such as during upgrades, do not flip the condition back toFalse. As a result, the hosted control plane now only reportsAvailable=Trueafter all control plane components have completed their initial rollout, ensuring a more reliable user experience. (OCPBUGS-74648) -
Before this update, the Hosted Cluster Config Operator contained logic that modified the
openshift-controller-manager-configconfig map to disable theserviceaccount-pull-secretscontroller when themanagementStateparameter of the image registry was set toRemoved. In OpenShift Container Platform 4.20 and later, Control Plane Operator v2 started managing this config map, but the Hosted Cluster Config Operator continued modifying it on every reconciliation cycle. As a consequence, theopenshift-controller-manager-configconfig map was updated by Hosted Cluster Config Operator every minute, which triggered theopenshift-controller-managerfile observer to detect changes and restart pods. This behavior caused constantopenshift-controller-managerpod restarts. With this release, the OpenShift Controller Manager config update logic is removed from the Hosted Cluster Config Operator because Control Plane Operator v2 manages theopenshift-controller-manager-configconfig map. As a result, theopenshift-controller-managerpods no longer experience unnecessary restarts. (OCPBUGS-74931) -
Before this update, during the backup and restore process with OADP, the token secret was deleted before the
NodePoolobject was restored. Then, theNodePoolcontroller created a token secret without theignition-reachedannotation. Because nodes were already running, they did not contact the ignition endpoint again, so the annotation was never set back. As a consequence, theReachedIgnitionEndpointcondition stayedFalse, blocking machine health check creation and disabling auto-repair for the restored node pools. With this release, when theHostedClusterobject has thehypershift.openshift.io/restored-from-backupannotation set by the OADP plugin, the token secret is created with theignition-reached=Trueparameter, preserving the condition across the restore process. As a result, after a backup and restore process, node pools correctly reportReachedIgnitionEndpoint=Trueso that the machine health check and auto-repair work as expected. (OCPBUGS-77621) -
Before this update, when deploying hosted clusters with a 4.21 or later payload, the HyperShift Operator used hard-coded
quay.ioimage references for the Cluster API manager and platform-specific Cluster API provider containers. These hard-coded images bypassed the standard release payload image lookup, which respectsImageContentSourcePolicies(ICSPs) andImageDigestMirrorSets(IDMSs). As a consequence, in disconnected or mirrored environments, Cluster API images were always pulled directly fromquay.ioeven when registry overrides were configured, causing image pull failures and preventing cluster creation. With this update, the backward-compatible Cluster API image references are resolved by looking up the component from a pinned 4.20.10 release payload through the standard release image provider, which correctly follows registry override configuration. As a result, Cluster API images are pulled from the correct mirror registry in disconnected environments. For this fix to work, the 4.20.10 release payload from thequay.io/openshift-release-dev/ocp-release:4.20.10-multiimage must be mirrored to the target mirror registry. (OCPBUGS-74247) - Before this update, requests from the Kubernetes API server bootstrap container were denied by a validating admission policy that restricts feature gate changes to a specific user. As a consequence, the bootstrap container was unable to apply feature gate changes, causing control plane issues. With this release, a dedicated identity is created for the Kubernetes API server bootstrap container and is allow-listed in the policy. As a result, the bootstrap container can apply feature gate changes without being denied by the validating admission policy. (OCPBUGS-50603)
-
Before this update, when a predicate of a Control Plane Operator v2 component evaluated to
false, the framework tried to look up and clean up the associated resource by using the cached client. For resource types not installed on the management cluster, such as theSecretProviderClasscustom resource definition of the Secrets Store CSI driver, this caused the cached client to create an informer that retried list and watch actions indefinitely, blocking all control plane reconciliation. As a consequence, hosted cluster creation failed on management clusters that did not have the Secrets Store CSI driver custom resource definition installed. With this update, the Control Plane Operator probes whether a resource type is accessible on the management cluster before trying to interact with it. If the custom resource definition is not installed or the operator lacks role-based access permission, the operation is skipped gracefully and the result is cached. As a result, hosted cluster creation succeeds even when optional custom resource definitions such as the Secrets Store CSI driver are not present on the management cluster. (OCPBUGS-65687) -
Before this update, when using a custom API server DNS name with external DNS, the
kubeconfigsecret contained an incorrect port. As a consequence, connections to the API server failed with reset errors. With this update, thekubeconfiguses the correct port for the configured DNS setup. As a result, external DNS connections work as expected. (OCPBUGS-72258) -
Before this update, a race condition in
VolumeSnapshotprocessing where a snapshot was deleted between listing and retrieving was treated as an unrecoverable error, ending the processing of remaining snapshots. As a consequence, intermittent backup failures (about 25% of scheduled backups) were marked asPartiallyFailedwith missing etcd PVC data. With this release, deleted snapshots are gracefully skipped instead of treated as unrecoverable errors, allowing the remaining snapshots to be processed normally. As a result, backups are completed successfully even when snapshot cleanup races with plugin processing. (OCPBUGS-75913) -
Before this update, when the scale-from-zero feature was enabled in AWS and a node pool used the
InPlacenode upgrade type with autoscaling set tomin=0, the scale-from-zero implementation did not support theInPlaceupgrade strategy. The original implementation used a machine deployment controller approach that only worked with theReplaceupgrade strategy. As a consequence, new workloads did not trigger node pool scale-up from zero when using theInPlaceupgrade type, preventing nodes from being created even when pods were pending. With this release, the scale-from-zero implementation uses a generic provider pattern that works with all upgrade types. As a result, node pools that use theInPlaceupgrade type can scale up from zero when workload demands require additional capacity. The autoscaler correctly provisions nodes regardless of the upgrade strategy. (OCPBUGS-70320) -
Before this update, a race condition in the
globalpscontroller skipped labeling new nodes, causing a delay inglobal-pull-secret-syncerpod scheduling. As a consequence, users experienced image pull failures from private registries on new nodes. With this release, the scheduling delay is resolved, fixing the race condition in the Hosted Cluster Config Operator. As a result, theglobal-pull-secret-syncerpod now schedules immediately on new nodes, ensuring timely access to private images. (OCPBUGS-77254) -
Before this update, the
IsCloudAPImethod for theKonnectivityproxy did not include Amazon Web Services (AWS) ISO region domain suffixes, such as.c2s.ic.gov,.hci.ic.gov, or.sc2s.sgov.govin its cloud API detection lists. As a consequence, the Ingress Operator could not add AWS ISO domains to theNO_PROXYlist, blocking direct communication with endpoints. With this release, the AWS ISO suffixes are added to theIsCloudAPIdetection list. As a result, theKonnectivityproxy correctly identifies the AWS ISO region endpoints as cloud APIs, so that the Ingress Operator can route traffic to those domains directly. (OCPBUGS-85779) -
Before this update, when you deployed a hosted cluster on OpenShift Virtualization with external infrastructure, the
virt-launchernetwork policy was not created on the infrastructure cluster where the KubeVirtvirt-launcherpods and virtual machines (VM)s run. As a consequence, the KubeVirt VMs had unrestricted network access to all pods and services on the infrastructure cluster, breaking tenant isolation. With this release, thevirt-launchernetwork policy is created with CIDR-based egress restrictions. As a result, multitenant isolation is no longer compromised. (OCPBUGS-78575)
1.3. Technology Preview features status Copia collegamentoCollegamento copiato negli appunti!
Some features in this release are currently in Technology Preview. These experimental features are not intended for production use. Note the following scope of support on the Red Hat Customer Portal for these features:
Technology Preview Features Support Scope
In the following table, features are marked with the following statuses:
- Not Available
- Technology Preview
- General Availability
- Deprecated
- Removed
For IBM Power and IBM Z, the following exceptions apply:
- For version 4.20 and later, you must run the control plane on machine types that are based on 64-bit x86 architecture or s390x architecture, and node pools on IBM Power or IBM Z.
- For version 4.19 and earlier, you must run the control plane on machine types that are based on 64-bit x86 architecture, and node pools on IBM Power or IBM Z.
| Feature | 4.20 | 4.21 | 4.22 |
|---|---|---|---|
| Hosted control planes for OpenShift Container Platform using non-bare-metal agent machines | Technology Preview | Technology Preview | Technology Preview |
| Hosted control planes for OpenShift Container Platform on RHOSP | Technology Preview | Technology Preview | Technology Preview |
| Custom taints and tolerations | Technology Preview | Technology Preview | Technology Preview |
| NVIDIA GPU devices on hosted control planes for OpenShift Virtualization | Technology Preview | Technology Preview | Technology Preview |
| Hosted control planes for OpenShift Virtualization on IBM Z [1] | Not Available | Technology Preview | General Availability |
| Hosted control planes on IBM Z in a disconnected environment | General Availability | General Availability | General Availability |
| Hosted control planes for OpenShift Container Platform on Microsoft Azure | Not Available | Not Available | Technology Preview |
| Backup and restore with the etcd snapshot method | Not Available | Not Available | Technology Preview |
- Hosted control planes for OpenShift Virtualization on IBM Z is supported as Technology Preview starting with OpenShift Container Platform 4.21, multicluster engine for Kubernetes Operator 2.11, and Red Hat Advanced Cluster Management (RHACM) 2.16. Creating hosted control planes with external infrastructure is not supported.
1.4. Known issues Copia collegamentoCollegamento copiato negli appunti!
This section includes several known issues for hosted control planes for OpenShift Container Platform 4.22.
-
If the annotation and the
ManagedClusterresource name do not match, the multicluster engine for Kubernetes Operator console displays the cluster asPending import. The cluster cannot be used by the multicluster engine Operator. The same issue happens when there is no annotation and theManagedClustername does not match theInfra-IDvalue of theHostedClusterresource. - When you use the multicluster engine for Kubernetes Operator console to add a new node pool to an existing hosted cluster, the same version of OpenShift Container Platform might appear more than once in the list of options. You can select any instance in the list for the version that you want.
When a node pool is scaled down to 0 workers, the list of hosts in the console still shows nodes in a
Readystate. You can verify the number of nodes in two ways:- In the console, go to the node pool and verify that it has 0 nodes.
On the command-line interface, run the following commands:
Verify that 0 nodes are in the node pool by running the following command:
$ oc get nodepool -AVerify that 0 nodes are in the cluster by running the following command:
$ oc get nodes --kubeconfigVerify that 0 agents are reported as bound to the cluster by running the following command:
$ oc get agents -A
-
When you create a hosted cluster in an environment that uses the dual-stack network, you might encounter pods stuck in the
ContainerCreatingstate. This issue occurs because theopenshift-service-ca-operatorresource cannot generate themetrics-tlssecret that the DNS pods need for DNS resolution. As a result, the pods cannot resolve the Kubernetes API server. To resolve this issue, configure the DNS server settings for a dual stack network. If you created a hosted cluster in the same namespace as its managed cluster, detaching the managed hosted cluster deletes everything in the managed cluster namespace including the hosted cluster. The following situations can create a hosted cluster in the same namespace as its managed cluster:
- You created a hosted cluster on the Agent platform through the multicluster engine for Kubernetes Operator console by using the default hosted cluster cluster namespace.
- You created a hosted cluster through the command-line interface or API by specifying the hosted cluster namespace to be the same as the hosted cluster name.
-
When you use the console or API to specify an IPv6 address for the
spec.services.servicePublishingStrategy.nodePort.addressfield of a hosted cluster, a full IPv6 address with 8 hextets is required. For example, instead of specifying2620:52:0:1306::30, you need to specify2620:52:0:1306:0:0:0:30. - In hosted control planes on OpenShift Virtualization, if you store all hosted cluster information in a shared namespace and then back up and restore a hosted cluster, you might unintentionally change other hosted clusters. To avoid this issue, back up and restore only hosted clusters that use labels, or avoid storing all hosted cluster information in a shared namespace.
-
For version 4.21, hosted control planes pins all Cluster API images to the
4.20.10-multirelease image for compatibility reasons. Hosted control planes pins the images when Cluster API deployments are generated. The4.20.10-multiimage must always be mirrored and available in order for the Cluster API to work with hosted control planes version 4.21. Intermittent egress IP outages occur when a hosted cluster uses the following combined settings:
-
The service publishing strategy for the
Konnectivityservice is set toRoute. - The management cluster uses Virtual Router Redundancy Protocol (VRRP) VIP for ingress.
-
The service publishing strategy for the