此内容没有您所选择的语言版本。
Chapter 5. Adopting the data plane
Adopting the Red Hat OpenStack Services on OpenShift (RHOSO) data plane involves the following steps:
- Stopping any remaining services on the Red Hat OpenStack Platform (RHOSP) control plane.
- Deploying the required custom resources.
- If applicable, performing a fast-forward upgrade on Compute services from RHOSP 17.1 to RHOSO 18.0.
After the RHOSO control plane is managing the newly deployed data plane, you must not re-enable services on the RHOSP 17.1 control plane and data plane. Re-enabling services causes workloads to be managed by two control planes or two data planes, resulting in data corruption, loss of control of existing workloads, inability to start new workloads, or other issues.
5.1. Stopping infrastructure management and Compute services
The source cloud’s control plane can be decomissioned, which is taking down only cloud controllers, database and messaging nodes. Nodes that must remain functional are those running the Compute, storage, or networker roles (in terms of composable roles covered by director Heat Templates).
Prerequisites
Define the following shell variables. The values that are used are examples and refer to a single node standalone director deployment. Replace these example values with values that are correct for your environment:
EDPM_PRIVATEKEY_PATH="<path to SSH key>" declare -A computes computes=( ["standalone.localdomain"]="192.168.122.100" # ... )
-
Replace
["standalone.localdomain"]="192.168.122.100"
with the name of the Compute node and its IP address. These ssh variables with the ssh commands are used instead of ansible to create instructions that are independent of where they are running. But ansible commands could be used to achieve the same result if you are in the right host, for example to stop a service:
. stackrc ansible -i $(which tripleo-ansible-inventory) Compute -m shell -a "sudo systemctl stop tripleo_virtqemud.service" -b
-
Replace
Procedure
Run the following script to remove the conflicting repositories and packages (in case of a devsetup that uses Standalone director) from all Compute hosts. That is required to install libvirt packages, when these hosts become adopted as data plane nodes, where modular libvirt daemons are no longer running in podman containers:
PacemakerResourcesToStop=( "galera-bundle" "haproxy-bundle" "rabbitmq-bundle") echo "Stopping pacemaker services" for i in {1..3}; do SSH_CMD=CONTROLLER${i}_SSH if [ ! -z "${!SSH_CMD}" ]; then echo "Using controller $i to run pacemaker commands" for resource in ${PacemakerResourcesToStop[*]}; do if ${!SSH_CMD} sudo pcs resource config $resource; then ${!SSH_CMD} sudo pcs resource disable $resource fi done break fi done
5.2. Adopting Compute services to the RHOSO data plane
Prerequisites
- Remaining source cloud Stopping infrastructure management and Compute services on Compute hosts.
- Ceph backend for Nova/Libvirt is configured Configuring a Ceph backend.
- Make sure the IPAM is configured
oc apply -f - <<EOF apiVersion: network.openstack.org/v1beta1 kind: NetConfig metadata: name: netconfig spec: networks: - name: ctlplane dnsDomain: ctlplane.example.com subnets: - name: subnet1 allocationRanges: - end: 192.168.122.120 start: 192.168.122.100 - end: 192.168.122.200 start: 192.168.122.150 cidr: 192.168.122.0/24 gateway: 192.168.122.1 - name: internalapi dnsDomain: internalapi.example.com subnets: - name: subnet1 allocationRanges: - end: 172.17.0.250 start: 172.17.0.100 cidr: 172.17.0.0/24 vlan: 20 - name: External dnsDomain: external.example.com subnets: - name: subnet1 allocationRanges: - end: 10.0.0.250 start: 10.0.0.100 cidr: 10.0.0.0/24 gateway: 10.0.0.1 - name: storage dnsDomain: storage.example.com subnets: - name: subnet1 allocationRanges: - end: 172.18.0.250 start: 172.18.0.100 cidr: 172.18.0.0/24 vlan: 21 - name: storagemgmt dnsDomain: storagemgmt.example.com subnets: - name: subnet1 allocationRanges: - end: 172.20.0.250 start: 172.20.0.100 cidr: 172.20.0.0/24 vlan: 23 - name: tenant dnsDomain: tenant.example.com subnets: - name: subnet1 allocationRanges: - end: 172.19.0.250 start: 172.19.0.100 cidr: 172.19.0.0/24 vlan: 22 EOF
-
When
neutron-sriov-nic-agent
is running on the existing Compute nodes, check the physical device mappings and ensure that they match the values that are defined in theOpenStackDataPlaneNodeSet
custom resource (CR). For more information, see Pulling the configuration from a director deployment. -
Define the shell variables necessary to run the script that runs the fast-forward upgrade. Omit setting
CEPH_FSID
, if the local storage backend is going to be configured by Nova for Libvirt. The storage backend cannot be changed during adoption, and must match the one used on the source cloud:
PODIFIED_DB_ROOT_PASSWORD=$(oc get -o json secret/osp-secret | jq -r .data.DbRootPassword | base64 -d) CEPH_FSID=$(oc get secret ceph-conf-files -o json | jq -r '.data."ceph.conf"' | base64 -d | grep fsid | sed -e 's/fsid = //' alias openstack="oc exec -t openstackclient -- openstack" declare -A computes export computes=( ["standalone.localdomain"]="192.168.122.100" # ... )
-
Replace
["standalone.localdomain"]="192.168.122.100"
with the name of the Compute node and its IP address.
Procedure
Create a ssh authentication secret for the data plane nodes:
oc apply -f - <<EOF apiVersion: v1 kind: Secret metadata: name: dataplane-adoption-secret namespace: openstack data: ssh-privatekey: | $(cat <path to SSH key> | base64 | sed 's/^/ /') EOF
Generate an ssh key-pair
nova-migration-ssh-key
secret:cd "$(mktemp -d)" ssh-keygen -f ./id -t ecdsa-sha2-nistp521 -N '' oc get secret nova-migration-ssh-key || oc create secret generic nova-migration-ssh-key \ -n openstack \ --from-file=ssh-privatekey=id \ --from-file=ssh-publickey=id.pub \ --type kubernetes.io/ssh-auth rm -f id* cd -
-
Create a
nova-compute-extra-config
service (with local storage backend for lbivrt): If TLS Everywhere is enabled, append the following to the OpenStackDataPlaneService spec:
tlsCert: contents: - dnsnames - ips networks: - ctlplane issuer: osp-rootca-issuer-internal caCerts: combined-ca-bundle edpmServiceType: nova
oc apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: nova-extra-config namespace: openstack data: 19-nova-compute-cell1-workarounds.conf: | [workarounds] disable_compute_service_check_for_ffu=true EOF
The secret
nova-cell<X>-compute-config
is auto-generated for eachcell<X>
. You must specifynova-cell<X>-compute-config
andnova-migration-ssh-key
for each customOpenStackDataPlaneService
related to the Compute service.
That service removes pre-FFU workarounds and configures Compute services for local storage backend.
Or, create a
nova-compute-extra-config
service (with Ceph backend for libvirt):oc apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: nova-extra-config namespace: openstack data: 19-nova-compute-cell1-workarounds.conf: | [workarounds] disable_compute_service_check_for_ffu=true 03-ceph-nova.conf: | [libvirt] images_type=rbd images_rbd_pool=vms images_rbd_ceph_conf=/etc/ceph/ceph.conf images_rbd_glance_store_name=default_backend images_rbd_glance_copy_poll_interval=15 images_rbd_glance_copy_timeout=600 rbd_user=openstack rbd_secret_uuid=$CEPH_FSID EOF
That service removes pre-FFU workarounds and configures Compute services for Ceph storage backend. Provided above resources should contain a cell-specific configurations. For multi-cell, config maps and Red Hat OpenStack Platform data plane services should be named like
nova-custom-ceph-cellX
andnova-compute-extraconfig-cellX
.Create a secret for the subscription manager and a secret for the Red Hat registry:
oc apply -f - <<EOF apiVersion: v1 kind: Secret metadata: name: subscription-manager data: username: <base64 encoded subscription-manager username> password: <base64 encoded subscription-manager password> --- apiVersion: v1 kind: Secret metadata: name: redhat-registry data: username: <base64 encoded registry username> password: <base64 encoded registry password> EOF
-
Deploy the
OpenStackDataPlaneNodeSet
CR: - If TLS Everywhere is enabled, change spec:tlsEnabled to true
If using a custom DNS Domain, modify the spec:nodes:[NODE NAME]:hostName to use fqdn for the node
oc apply -f - <<EOF apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneNodeSet metadata: name: openstack spec: tlsEnabled: false networkAttachments: - ctlplane preProvisioned: true services: - bootstrap - download-cache - configure-network - validate-network - install-os - configure-os - ssh-known-hosts - run-os - install-certs - libvirt - nova - ovn - neutron-metadata env: - name: ANSIBLE_CALLBACKS_ENABLED value: "profile_tasks" - name: ANSIBLE_FORCE_COLOR value: "True" nodes: standalone: hostName: standalone ansible: ansibleHost: ${computes[standalone.localdomain]} networks: - defaultRoute: true fixedIP: ${computes[standalone.localdomain]} name: ctlplane subnetName: subnet1 - name: internalapi subnetName: subnet1 - name: storage subnetName: subnet1 - name: tenant subnetName: subnet1 nodeTemplate: ansibleSSHPrivateKeySecret: dataplane-adoption-secret ansible: ansibleUser: root ansibleVarsFrom: - prefix: subscription_manager_ secretRef: name: subscription-manager - prefix: registry_ secretRef: name: redhat-registry ansibleVars: edpm_bootstrap_release_version_package: [] service_net_map: nova_api_network: internalapi nova_libvirt_network: internalapi # edpm_network_config # Default nic config template for a EDPM compute node # These vars are edpm_network_config role vars edpm_network_config_template: | --- {% set mtu_list = [ctlplane_mtu] %} {% for network in nodeset_networks %} {{ mtu_list.append(lookup('vars', networks_lower[network] ~ '_mtu')) }} {%- endfor %} {% set min_viable_mtu = mtu_list | max %} network_config: - type: ovs_bridge name: {{ neutron_physical_bridge_name }} mtu: {{ min_viable_mtu }} use_dhcp: false dns_servers: {{ ctlplane_dns_nameservers }} domain: {{ dns_search_domains }} addresses: - ip_netmask: {{ ctlplane_ip }}/{{ ctlplane_cidr }} routes: {{ ctlplane_host_routes }} members: - type: interface name: nic1 mtu: {{ min_viable_mtu }} # force the MAC address of the bridge to this interface primary: true {% for network in nodeset_networks %} - type: vlan mtu: {{ lookup('vars', networks_lower[network] ~ '_mtu') }} vlan_id: {{ lookup('vars', networks_lower[network] ~ '_vlan_id') }} addresses: - ip_netmask: {{ lookup('vars', networks_lower[network] ~ '_ip') }}/{{ lookup('vars', networks_lower[network] ~ '_cidr') }} routes: {{ lookup('vars', networks_lower[network] ~ '_host_routes') }} {% endfor %} edpm_network_config_hide_sensitive_logs: false # # These vars are for the network config templates themselves and are # considered EDPM network defaults. neutron_physical_bridge_name: br-ctlplane neutron_public_interface_name: eth0 # edpm_nodes_validation edpm_nodes_validation_validate_controllers_icmp: false edpm_nodes_validation_validate_gateway_icmp: false # edpm ovn-controller configuration edpm_ovn_bridge_mappings: <bridge_mappings> edpm_ovn_bridge: br-int edpm_ovn_encap_type: geneve ovn_match_northd_version: false ovn_monitor_all: true edpm_ovn_remote_probe_interval: 60000 edpm_ovn_ofctrl_wait_before_clear: 8000 timesync_ntp_servers: - hostname: clock.redhat.com - hostname: clock2.redhat.com edpm_bootstrap_command: | subscription-manager register --username {{ subscription_manager_username }} --password {{ subscription_manager_password }} subscription-manager release --set=9.2 subscription-manager repos --disable=* subscription-manager repos --enable=rhel-9-for-x86_64-baseos-eus-rpms --enable=rhel-9-for-x86_64-appstream-eus-rpms --enable=rhel-9-for-x86_64-highavailability-eus-rpms --enable=openstack-17.1-for-rhel-9-x86_64-rpms --enable=fast-datapath-for-rhel-9-x86_64-rpms --enable=openstack-dev-preview-for-rhel-9-x86_64-rpms # FIXME: perform dnf upgrade for other packages in EDPM ansible # here we only ensuring that decontainerized libvirt can start dnf -y upgrade openstack-selinux rm -f /run/virtlogd.pid podman login -u {{ registry_username }} -p {{ registry_password }} registry.redhat.io gather_facts: false enable_debug: false # edpm firewall, change the allowed CIDR if needed edpm_sshd_configure_firewall: true edpm_sshd_allowed_ranges: ['192.168.122.0/24'] # SELinux module edpm_selinux_mode: enforcing # Do not attempt OVS major upgrades here edpm_ovs_packages: - openvswitch3.1 EOF
Prepare adopted EDPM workloads to use Ceph backend for Block Storage service (cinder), if configured so
oc patch osdpns/openstack --type=merge --patch " spec: services: - repo-setup - download-cache - bootstrap - configure-network - validate-network - install-os - configure-os - run-os - install-certs - ceph-client - libvirt - nova - ovn - neutron-metadata nodeTemplate: extraMounts: - extraVolType: Ceph volumes: - name: ceph secret: secretName: ceph-conf-files mounts: - name: ceph mountPath: "/etc/ceph" readOnly: true "
Replace
<bridge_mappings>
with the value of the bridge mappings in your configuration, for example,"datacentre:br-ctlplane"
.Ensure that the
ovn-controller
settings that are configured in theOpenStackDataPlaneNodeSet
CR are the same as were set in the Compute nodes before adoption. This configuration is stored in theexternal_ids`
column in theOpen_vSwitch
table in the Open vSwitch database:ovs-vsctl list Open . ... external_ids : {hostname=standalone.localdomain, ovn-bridge=br-int, ovn-bridge-mappings=<bridge_mappings>, ovn-chassis-mac-mappings="datacentre:1e:0a:bb:e6:7c:ad", ovn-encap-ip="172.19.0.100", ovn-encap-tos="0", ovn-encap-type=geneve, ovn-match-northd-version=False, ovn-monitor-all=True, ovn-ofctrl-wait-before-clear="8000", ovn-openflow-probe-interval="60", ovn-remote="tcp:ovsdbserver-sb.openstack.svc:6642", ovn-remote-probe-interval="60000", rundir="/var/run/openvswitch", system-id="2eec68e6-aa21-4c95-a868-31aeafc11736"} ...
Note that you should retain the original OpenStackDataPlaneNodeSet
services composition, except the inserted ceph-client
service.
+ * Replace <bridge_mappings>
with the value of the bridge mappings in your configuration, for example, "datacentre:br-ctlplane"
.
Optional: Enable
neutron-sriov-nic-agent
in theOpenStackDataPlaneNodeSet
CR:oc patch openstackdataplanenodeset openstack --type='json' --patch='[ { "op": "add", "path": "/spec/services/-", "value": "neutron-sriov" }, { "op": "add", "path": "/spec/nodeTemplate/ansible/ansibleVars/edpm_neutron_sriov_agent_SRIOV_NIC_physical_device_mappings", "value": "dummy_sriov_net:dummy-dev" }, { "op": "add", "path": "/spec/nodeTemplate/ansible/ansibleVars/edpm_neutron_sriov_agent_SRIOV_NIC_resource_provider_bandwidths", "value": "dummy-dev:40000000:40000000" }, { "op": "add", "path": "/spec/nodeTemplate/ansible/ansibleVars/edpm_neutron_sriov_agent_SRIOV_NIC_resource_provider_hypervisors", "value": "dummy-dev:standalone.localdomain" } ]'
Optional: Enable
neutron-dhcp
in theOpenStackDataPlaneNodeSet
CR:oc patch openstackdataplanenodeset openstack --type='json' --patch='[ { "op": "add", "path": "/spec/services/-", "value": "neutron-dhcp" }]'
Run pre-adoption validation:
Create the validation service:
oc apply -f - <<EOF apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneService metadata: name: pre-adoption-validation spec: playbook: osp.edpm.pre_adoption_validation EOF
Create a
OpenStackDataPlaneDeployment
CR that runs the validation only:oc apply -f - <<EOF apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneDeployment metadata: name: openstack-pre-adoption spec: nodeSets: - openstack servicesOverride: - pre-adoption-validation EOF
Wait for the validation to finish.
Confirm that all the Ansible EE pods reach a
Completed
status:# watching the pods watch oc get pod -l app=openstackansibleee
# following the ansible logs with: oc logs -l app=openstackansibleee -f --max-log-requests 20
Wait for the deployment to reach the
Ready
status:oc wait --for condition=Ready openstackdataplanedeployment/openstack-pre-adoption --timeout=10m
If any openstack-pre-adoption validations fail, you must first determine which ones were unsuccessful based on the ansible logs and then follow the instructions below for each case:
-
if the hostname validation failed then check that the hostname of the EDPM node is correctly listed in the
OpenStackDataPlaneNodeSet
-
if the kernel argument check failed then make sure that the
OpenStackDataPlaneNodeSet
has the same kernel argument configuration inedpm_kernel_args
andedpm_kernel_hugepages
variables than what is used in the 17 node. -
if the tuned profile check failed then make sure that the
edpm_tuned_profile
variable in theOpenStackDataPlaneNodeSet
is configured to use the same profile as set on the (source) OSP 17 node.
-
if the hostname validation failed then check that the hostname of the EDPM node is correctly listed in the
Remove leftover director services
Create cleanup data plane service
--- oc apply -f - <<EOF apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneService metadata: name: tripleo-cleanup spec: playbook: osp.edpm.tripleo_cleanup EOF ---
Create OpenStackDataPlaneDeployment to run cleanup
--- oc apply -f - <<EOF apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneDeployment metadata: name: tripleo-cleanup spec: nodeSets: - openstack servicesOverride: - tripleo-cleanup EOF ---
- Wait for the removal to finish.
Deploy the
OpenStackDataPlaneDeployment
CR:oc apply -f - <<EOF apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneDeployment metadata: name: openstack spec: nodeSets: - openstack EOF
Verification
Confirm that all the Ansible EE pods reach a
Completed
status:# watching the pods watch oc get pod -l app=openstackansibleee
# following the ansible logs with: oc logs -l app=openstackansibleee -f --max-log-requests 20
Wait for the data plane node set to reach the
Ready
status:oc wait --for condition=Ready osdpns/openstack --timeout=30m
Verify that Networking service (neutron) agents are alive:
oc exec openstackclient -- openstack network agent list +--------------------------------------+------------------------------+------------------------+-------------------+-------+-------+----------------------------+ | ID | Agent Type | Host | Availability Zone | Alive | State | Binary | +--------------------------------------+------------------------------+------------------------+-------------------+-------+-------+----------------------------+ | 174fc099-5cc9-4348-b8fc-59ed44fcfb0e | DHCP agent | standalone.localdomain | nova | :-) | UP | neutron-dhcp-agent | | 10482583-2130-5b0d-958f-3430da21b929 | OVN Metadata agent | standalone.localdomain | | :-) | UP | neutron-ovn-metadata-agent | | a4f1b584-16f1-4937-b2b0-28102a3f6eaa | OVN Controller agent | standalone.localdomain | | :-) | UP | ovn-controller | +--------------------------------------+------------------------------+------------------------+-------------------+-------+-------+----------------------------+
5.3. Performing a fast-forward upgrade on Compute services
Compute services rolling upgrade cannot be done during adoption, there is in a lock-step with Compute control plane services, because those are managed independently by data plane ansible and Kubernetes Operators. Compute service operator and Dataplane Operator ensure upgrading is done independently of each other, by configuring [upgrade_levels]compute=auto
for Compute services. Compute control plane services apply the change right after custom resource (CR) is patched. Compute data plane services will catch up the same config change with ansible deployment later on.
Procedure
Wait for cell1 Compute data plane services version updated (it may take some time):
oc exec openstack-cell1-galera-0 -c galera -- mysql -rs -uroot -p$PODIFIED_DB_ROOT_PASSWORD \ -e "select a.version from nova_cell1.services a join nova_cell1.services b where a.version!=b.version and a.binary='nova-compute';"
The above query should return an empty result as a completion criterion.
Remove pre-fast-forward upgrade workarounds for Compute control plane services:
oc patch openstackcontrolplane openstack -n openstack --type=merge --patch ' spec: nova: template: cellTemplates: cell0: conductorServiceTemplate: customServiceConfig: | [workarounds] disable_compute_service_check_for_ffu=false cell1: metadataServiceTemplate: customServiceConfig: | [workarounds] disable_compute_service_check_for_ffu=false conductorServiceTemplate: customServiceConfig: | [workarounds] disable_compute_service_check_for_ffu=false apiServiceTemplate: customServiceConfig: | [workarounds] disable_compute_service_check_for_ffu=false metadataServiceTemplate: customServiceConfig: | [workarounds] disable_compute_service_check_for_ffu=false schedulerServiceTemplate: customServiceConfig: | [workarounds] disable_compute_service_check_for_ffu=false '
Wait for Compute control plane services' CRs to be ready:
oc wait --for condition=Ready --timeout=300s Nova/nova
Remove pre-fast-forward upgrade workarounds for Compute data plane services:
oc apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: nova-extra-config namespace: openstack data: 20-nova-compute-cell1-workarounds.conf: | [workarounds] disable_compute_service_check_for_ffu=false --- apiVersion: dataplane.openstack.org/v1beta1 kind: OpenStackDataPlaneDeployment metadata: name: openstack-nova-compute-ffu namespace: openstack spec: nodeSets: - openstack servicesOverride: - nova EOF
Wait for Compute data plane service to be ready:
oc wait --for condition=Ready openstackdataplanedeployment/openstack-nova-compute-ffu --timeout=5m
Run Compute database online migrations to complete the fast-forward upgrade:
oc exec -it nova-cell0-conductor-0 -- nova-manage db online_data_migrations oc exec -it nova-cell1-conductor-0 -- nova-manage db online_data_migrations
Discover Compute hosts in the cell:
oc rsh nova-cell0-conductor-0 nova-manage cell_v2 discover_hosts --verbose
Verify if Compute services can stop the existing test VM instance:
${BASH_ALIASES[openstack]} server list | grep -qF '| test | ACTIVE |' && ${BASH_ALIASES[openstack]} server stop test || echo PASS ${BASH_ALIASES[openstack]} server list | grep -qF '| test | SHUTOFF |' || echo FAIL ${BASH_ALIASES[openstack]} server --os-compute-api-version 2.48 show --diagnostics test 2>&1 || echo PASS
Verify if Compute services can start the existing test VM instance:
${BASH_ALIASES[openstack]} server list | grep -qF '| test | SHUTOFF |' && ${BASH_ALIASES[openstack]} server start test || echo PASS ${BASH_ALIASES[openstack]} server list | grep -F '| test | ACTIVE |' && \ ${BASH_ALIASES[openstack]} server --os-compute-api-version 2.48 show --diagnostics test --fit-width -f json | jq -r '.state' | grep running || echo FAIL
After the data plane adoption, the hosts continue to run Red Hat Enterprise Linux (RHEL) 9.2. To take advantage of RHEL 9.4, perform a minor update procedure after finishing the adoption procedure.