Chapter 8. Troubleshooting the Bare Metal Provisioning service
Diagnose issues in an environment that includes the Bare Metal Provisioning service (ironic).
8.1. PXE boot errors
Use the following troubleshooting procedures to assess and remedy issues you might encounter with PXE boot.
Permission Denied errors
					If the console of your bare metal node returns a Permission Denied error, ensure that you have applied the appropriate SELinux context to the /httpboot and /tftpboot directories:
				
semanage fcontext -a -t httpd_sys_content_t "/httpboot(/.*)?" restorecon -r -v /httpboot semanage fcontext -a -t tftpdir_t "/tftpboot(/.*)?" restorecon -r -v /tftpboot
# semanage fcontext -a -t httpd_sys_content_t "/httpboot(/.*)?"
# restorecon -r -v /httpboot
# semanage fcontext -a -t tftpdir_t "/tftpboot(/.*)?"
# restorecon -r -v /tftpbootBoot process freezes at /pxelinux.cfg/XX-XX-XX-XX-XX-XX
					On the console of your node, if it looks like you receive an IP address but then the process stops, you might be using the wrong PXE boot template in your ironic.conf file.
				
				 
			
grep ^pxe_config_template ironic.conf
$ grep ^pxe_config_template ironic.conf
pxe_config_template=$pybasedir/drivers/modules/ipxe_config.template
				The default template is pxe_config.template, so it is easy to omit the i and inadvertently enter ipxe_config.template instead.
			
8.2. Login errors after the bare metal node boots
				Failure to log in to the node when you use the root password that you set during configuration indicates that you are not booted into the deployed image. You might be logged in to the deploy-kernel/deploy-ramdisk image and the system has not yet loaded the correct image.
			
				To fix this issue, verify that the PXE Boot Configuration file in the /httpboot/pxelinux.cfg/MAC_ADDRESS on the Compute or Bare Metal Provisioning service node and ensure that all the IP addresses listed in this file correspond to IP addresses on the Bare Metal network.
			
The only network that the Bare Metal Provisioning service node uses is the Bare Metal network. If one of the endpoints is not on the network, the endpoint cannot reach the Bare Metal Provisioning service node as a part of the boot process.
For example, the kernel line in your file is as follows:
kernel http://192.168.200.2:8088/5a6cdbe3-2c90-4a90-b3c6-85b449b30512/deploy_kernel selinux=0 disk=cciss/c0d0,sda,hda,vda iscsi_target_iqn=iqn.2008-10.org.openstack:5a6cdbe3-2c90-4a90-b3c6-85b449b30512 deployment_id=5a6cdbe3-2c90-4a90-b3c6-85b449b30512 deployment_key=VWDYDVVEFCQJNOSTO9R67HKUXUGP77CK ironic_api_url=http://192.168.200.2:6385 troubleshoot=0 text nofb nomodeset vga=normal boot_option=netboot ip=${ip}:${next-server}:${gateway}:${netmask} BOOTIF=${mac}  ipa-api-url=http://192.168.200.2:6385 ipa-driver-name=ipmi boot_mode=bios initrd=deploy_ramdisk coreos.configdrive=0 || goto deploy
kernel http://192.168.200.2:8088/5a6cdbe3-2c90-4a90-b3c6-85b449b30512/deploy_kernel selinux=0 disk=cciss/c0d0,sda,hda,vda iscsi_target_iqn=iqn.2008-10.org.openstack:5a6cdbe3-2c90-4a90-b3c6-85b449b30512 deployment_id=5a6cdbe3-2c90-4a90-b3c6-85b449b30512 deployment_key=VWDYDVVEFCQJNOSTO9R67HKUXUGP77CK ironic_api_url=http://192.168.200.2:6385 troubleshoot=0 text nofb nomodeset vga=normal boot_option=netboot ip=${ip}:${next-server}:${gateway}:${netmask} BOOTIF=${mac}  ipa-api-url=http://192.168.200.2:6385 ipa-driver-name=ipmi boot_mode=bios initrd=deploy_ramdisk coreos.configdrive=0 || goto deploy| Value in the above example kernelline | Corresponding information | 
|---|---|
| http://192.168.200.2:8088 | 
								Parameter  | 
| 5a6cdbe3-2c90-4a90-b3c6-85b449b30512 | 
								UUID of the baremetal node in  | 
| deploy_kernel | 
								This is the deploy kernel image in the Image service that is copied down as  | 
| http://192.168.200.2:6385 | 
								Parameter  | 
| ipmi | The IPMI Driver in use by the Bare Metal Provisioning service for this node. | 
| deploy_ramdisk | 
								This is the deploy ramdisk image in the Image service that is copied down as  | 
				If a value does not correspond between the /httpboot/pxelinux.cfg/MAC_ADDRESS and the ironic.conf file:
			
- 
						Update the value in the ironic.conffile
- Restart the Bare Metal Provisioning service
- Re-deploy the Bare Metal instance
8.3. Boot-to-disk errors on deployed nodes
With certain hardware, you might experience a problem with deployed nodes where the nodes cannot boot from disk during successive boot operations as part of a deployment. This usually happens because the BMC does not honor the persistent boot settings that director requests on the nodes. Instead, the nodes boot from a PXE target.
In this case, you must update the boot order in the BIOS of the nodes. Set the HDD to be the first boot device, and then PXE as a later option, so that the nodes boot from disk by default, but can boot from the network during introspection or deployment as necessary.
This error mostly applies to nodes that use LegacyBIOS firmware.
8.4. The Bare Metal Provisioning service does not receive the correct host name
				If the Bare Metal Provisioning service does not receive the right host name, it means that cloud-init is failing. To fix this, connect the Bare Metal subnet to a router in the OpenStack Networking service. This configuration routes requests to the meta-data agent correctly.
			
8.5. Invalid OpenStack Identity service credentials when executing Bare Metal Provisioning service commands
				If you cannot authenticate to the Identity service, check the identity_uri parameter in the ironic.conf file and ensure that you remove the /v2.0 from the keystone AdminURL. For example, set the identity_uri to http://IP:PORT.
			
8.6. Hardware enrolment
Incorrect node registration details can cause issues with enrolled hardware. Ensure that you enter property names and values correctly. When you input property names incorrectly, the system adds the properties to the node details but ignores them.
				Use the openstack baremetal node set command to update node details. For example, update the amount of memory that the node is registered to use to 2 GB:
			
openstack baremetal node set --property memory_mb=2048 NODE_UUID
$ openstack baremetal node set --property memory_mb=2048 NODE_UUID8.7. Troubleshooting iDRAC issues
- Redfish management interface fails to set boot device
- When you use the - idrac-redfishmanagement interface with certain iDRAC firmware versions and attempt to set the boot device on a bare metal server with UEFI boot, iDRAC returns the following error:- Unable to Process the request because the value entered for the parameter Continuous is not supported by the implementation. - Unable to Process the request because the value entered for the parameter Continuous is not supported by the implementation.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - If you encounter this issue, set the - force_persistent_boot_deviceparameter in the- driver-infoon the node to- Never:- openstack baremetal node set --driver-info force_persistent_boot_device=Never ${node_uuid}- openstack baremetal node set --driver-info force_persistent_boot_device=Never ${node_uuid}- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Timeout when powering off
- Some servers can be too slow when powering off, and time out. The default retry count is - 6, which results in a 30 second timeout. To increase the timeout duration to 90 seconds, set the- ironic::agent::rpc_response_timeoutvalue to- 18in the undercloud hieradata overrides file and re-run the- openstack undercloud installcommand:- ironic::agent::rpc_response_timeout: 18 - ironic::agent::rpc_response_timeout: 18- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Vendor passthrough timeout
- When iDRAC is not available to execute vendor passthrough commands, these commands take too long and time out: - openstack baremetal node passthru call --http-method GET \ aed58dca-1b25-409a-a32f-3a817d59e1e0 list_unfinished_jobs Timed out waiting for a reply to message ID 547ce7995342418c99ef1ea4a0054572 (HTTP 500) - openstack baremetal node passthru call --http-method GET \ aed58dca-1b25-409a-a32f-3a817d59e1e0 list_unfinished_jobs Timed out waiting for a reply to message ID 547ce7995342418c99ef1ea4a0054572 (HTTP 500)- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - To increase the timeout duration for messaging, increase the value of the - ironic::default::rpc_response_timeoutparameter in the undercloud hieradata overrides file and re-run the- openstack undercloud installcommand:- ironic::default::rpc_response_timeout: 600 - ironic::default::rpc_response_timeout: 600- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
8.8. Configuring the server console
Console output from overcloud nodes is not always sent to the server console. If you want to view this output in the server console, you must configure the overcloud to use the correct console for your hardware. Use one of the following methods to perform this configuration:
- 
						Modify the KernelArgsheat parameter for each overcloud role.
- 
						Customize the overcloud-full.qcow2image that director uses to provision the overcloud nodes.
Prerequisites
- A successful undercloud installation. For more information, see the Director Installation and Usage guide.
- Overcloud nodes ready for deployment.
Modifying KernelArgs with heat during deployment
- 
						Log in to the undercloud host as the stackuser.
- Source the - stackrccredentials file:- source stackrc - $ source stackrc- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create an environment file - overcloud-console.yamlwith the following content:- parameter_defaults: <role>Parameters: KernelArgs: "console=<console-name>"- parameter_defaults: <role>Parameters: KernelArgs: "console=<console-name>"- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace - <role>with the name of the overcloud role that you want to configure, and replace- <console-name>with the ID of the console that you want to use. For example, use the following snippet to configure all overcloud nodes in the default roles to use- tty0:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- 
						Include the overcloud-console-tty0.yamlfile in your deployment command with the-eoption.
Modifying the overcloud-full.qcow2 image
- 
						Log in to the undercloud host as the stackuser.
- Source the - stackrccredentials file:- source stackrc - $ source stackrc- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Modify the kernel arguments in the - overcloud-full.qcow2image to set the correct console for your hardware. For example, set the console to- tty0:- virt-customize --selinux-relabel -a overcloud-full.qcow2 --run-command 'grubby --update-kernel=ALL --args="console=tty0"' - $ virt-customize --selinux-relabel -a overcloud-full.qcow2 --run-command 'grubby --update-kernel=ALL --args="console=tty0"'- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Import the image into director: - openstack overcloud image upload --image-path /home/stack/images/overcloud-full.qcow2 - $ openstack overcloud image upload --image-path /home/stack/images/overcloud-full.qcow2- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Deploy the overcloud.
Verification
- Log in to an overcloud node from the undercloud: - ssh heat-admin@<IP-address> - $ ssh heat-admin@<IP-address>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Replace - <IP-address>with the IP address of an overcloud node.
- Inspect the contents of the - /proc/cmdlinefile and ensure that- console=parameter is set to the value of the console that you want to use:- [heat-admin@controller-0 ~]$ cat /proc/cmdline BOOT_IMAGE=(hd0,msdos2)/boot/vmlinuz-4.18.0-193.29.1.el8_2.x86_64 root=UUID=0ec3dea5-f293-4729-b676-5d38a611ce81 ro console=tty0 console=ttyS0,115200n81 no_timer_check crashkernel=auto rhgb quiet - [heat-admin@controller-0 ~]$ cat /proc/cmdline BOOT_IMAGE=(hd0,msdos2)/boot/vmlinuz-4.18.0-193.29.1.el8_2.x86_64 root=UUID=0ec3dea5-f293-4729-b676-5d38a611ce81 ro console=tty0 console=ttyS0,115200n81 no_timer_check crashkernel=auto rhgb quiet- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow