이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Chapter 8. Troubleshooting the Bare Metal Service
The following sections contain information and steps that may be useful for diagnosing issues in a setup with the Bare Metal service enabled.
8.1. PXE Boot Errors 링크 복사링크가 클립보드에 복사되었습니다!
Permission Denied Errors
If you get a permission denied error on the console of your Bare Metal service node, ensure that you have applied the appropriate SELinux context to the /httpboot and /tftpboot directories as follows:
semanage fcontext -a -t httpd_sys_content_t "/httpboot(/.*)?" restorecon -r -v /httpboot semanage fcontext -a -t tftpdir_t "/tftpboot(/.*)?" restorecon -r -v /tftpboot
# semanage fcontext -a -t httpd_sys_content_t "/httpboot(/.*)?"
# restorecon -r -v /httpboot
# semanage fcontext -a -t tftpdir_t "/tftpboot(/.*)?"
# restorecon -r -v /tftpboot
Boot Process Freezes at /pxelinux.cfg/XX-XX-XX-XX-XX-XX
On the console of your node, if it looks like you are getting an IP address and then the process stops as shown below:
This indicates that you might be using the wrong PXE boot template in your ironic.conf file.
grep ^pxe_config_template ironic.conf
$ grep ^pxe_config_template ironic.conf
pxe_config_template=$pybasedir/drivers/modules/ipxe_config.template
The default template is pxe_config.template, so it is easy to omit the i and inadvertently turn this into ipxe_config.template.
8.2. Login Errors After the Bare Metal Node Boots 링크 복사링크가 클립보드에 복사되었습니다!
When you try to log in at the login prompt on the console of the node with the root password that you set in the configurations steps, but are not able to, it indicates you are not booted in to the deployed image. You are probably stuck in the deploy-kernel/deploy-ramdisk image and the system has yet to get the correct image.
To fix this issue, verify the PXE Boot Configuration file in the /httpboot/pxelinux.cfg/MAC_ADDRESS on the Compute or Bare Metal service node and ensure that all the IP addresses listed in this file correspond to IP addresses on the Bare Metal network.
The only network the Bare Metal service node knows about is the Bare Metal network. If one of the endpoints is not on the network, the endpoint cannot reach the Bare Metal service node as a part of the boot process.
For example, the kernel line in your file is as follows:
kernel http://192.168.200.2:8088/5a6cdbe3-2c90-4a90-b3c6-85b449b30512/deploy_kernel selinux=0 disk=cciss/c0d0,sda,hda,vda iscsi_target_iqn=iqn.2008-10.org.openstack:5a6cdbe3-2c90-4a90-b3c6-85b449b30512 deployment_id=5a6cdbe3-2c90-4a90-b3c6-85b449b30512 deployment_key=VWDYDVVEFCQJNOSTO9R67HKUXUGP77CK ironic_api_url=http://192.168.200.2:6385 troubleshoot=0 text nofb nomodeset vga=normal boot_option=netboot ip=${ip}:${next-server}:${gateway}:${netmask} BOOTIF=${mac} ipa-api-url=http://192.168.200.2:6385 ipa-driver-name=ipmi boot_mode=bios initrd=deploy_ramdisk coreos.configdrive=0 || goto deploy
kernel http://192.168.200.2:8088/5a6cdbe3-2c90-4a90-b3c6-85b449b30512/deploy_kernel selinux=0 disk=cciss/c0d0,sda,hda,vda iscsi_target_iqn=iqn.2008-10.org.openstack:5a6cdbe3-2c90-4a90-b3c6-85b449b30512 deployment_id=5a6cdbe3-2c90-4a90-b3c6-85b449b30512 deployment_key=VWDYDVVEFCQJNOSTO9R67HKUXUGP77CK ironic_api_url=http://192.168.200.2:6385 troubleshoot=0 text nofb nomodeset vga=normal boot_option=netboot ip=${ip}:${next-server}:${gateway}:${netmask} BOOTIF=${mac} ipa-api-url=http://192.168.200.2:6385 ipa-driver-name=ipmi boot_mode=bios initrd=deploy_ramdisk coreos.configdrive=0 || goto deploy
Value in the above example kernel line | Corresponding information |
|---|---|
| http://192.168.200.2:8088 |
Parameter |
| 5a6cdbe3-2c90-4a90-b3c6-85b449b30512 |
UUID of the baremetal node in |
| deploy_kernel |
This is the deploy kernel image in the Image service that is copied down as |
| http://192.168.200.2:6385 |
Parameter |
| ipmi | The IPMI Driver in use by the Bare Metal service for this node. |
| deploy_ramdisk |
This is the deploy ramdisk image in the Image service that is copied down as |
If a value does not correspond between the /httpboot/pxelinux.cfg/MAC_ADDRESS and the ironic.conf file:
-
Update the value in the
ironic.conffile - Restart the Bare Metal service
- Re-deploy the Bare Metal instance
8.3. The Bare Metal Service Is Not Getting the Right Hostname 링크 복사링크가 클립보드에 복사되었습니다!
If the Bare Metal service is not getting the right hostname, it means that cloud-init is failing. To fix this, connect the Bare Metal subnet to a router in the OpenStack Networking service. The requests to the meta-data agent should now be routed correctly.
If you are having trouble authenticating to the Identity service, check the identity_uri parameter in the ironic.conf file and ensure that you remove the /v2.0 from the keystone AdminURL. For example, set the identity_uri to http://IP:PORT.
8.5. Hardware Enrollment 링크 복사링크가 클립보드에 복사되었습니다!
Issues with enrolled hardware can be caused by incorrect node registration details. Ensure that property names and values have been entered correctly. Incorrect or mistyped property names will be successfully added to the node’s details, but will be ignored.
Update a node’s details. This example updates the amount of memory the node is registered to use to 2 GB:
openstack baremetal node set --property memory_mb=2048 NODE_UUID
$ openstack baremetal node set --property memory_mb=2048 NODE_UUID
8.6. No Valid Host Errors 링크 복사링크가 클립보드에 복사되었습니다!
If the Compute scheduler cannot find a suitable Bare Metal node on which to boot an instance, a NoValidHost error can be seen in /var/log/nova/nova-conductor.log or immediately upon launch failure in the dashboard. This is usually caused by a mismatch between the resources Compute expects and the resources the Bare Metal node provides.
Check the hypervisor resources that are available:
openstack hypervisor stats show
$ openstack hypervisor stats showCopy to Clipboard Copied! Toggle word wrap Toggle overflow The resources reported here should match the resources that the Bare Metal nodes provide.
Check that Compute recognizes the Bare Metal nodes as hypervisors:
openstack hypervisor list
$ openstack hypervisor listCopy to Clipboard Copied! Toggle word wrap Toggle overflow The nodes, identified by UUID, should appear in the list.
Check the details for a Bare Metal node:
openstack baremetal node list openstack baremetal node show NODE_UUID
$ openstack baremetal node list $ openstack baremetal node show NODE_UUIDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the node’s details match those reported by Compute.
Check that the selected flavor does not exceed the available resources of the Bare Metal nodes:
openstack flavor show FLAVOR_NAME
$ openstack flavor show FLAVOR_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the output of openstack baremetal node list to ensure that Bare Metal nodes are not in maintenance mode. Remove maintenance mode if necessary:
openstack baremetal node maintenance unset NODE_UUID
$ openstack baremetal node maintenance unset NODE_UUIDCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the output of openstack baremetal node list to ensure that Bare Metal nodes are in an
availablestate. Move the node toavailableif necessary:openstack baremetal node provide NODE_UUID
$ openstack baremetal node provide NODE_UUIDCopy to Clipboard Copied! Toggle word wrap Toggle overflow
8.7. Troubleshooting iDRAC issues 링크 복사링크가 클립보드에 복사되었습니다!
- Redfish management interface fails to set boot device
When you use the
idrac-redfishmanagement interface with certain iDRAC firmware versions and attempt to set the boot device on a bare metal server with UEFI boot, iDRAC returns the following error:Unable to Process the request because the value entered for the parameter Continuous is not supported by the implementation.
Unable to Process the request because the value entered for the parameter Continuous is not supported by the implementation.Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you encounter this issue, set the
force_persistent_boot_deviceparameter in thedriver-infoon the node toNever:openstack baremetal node set --driver-info force_persistent_boot_device=Never ${node_uuid}openstack baremetal node set --driver-info force_persistent_boot_device=Never ${node_uuid}Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Timeout when powering off
Some servers can be too slow when powering off, and time out. The default retry count is
6, which results in a 30 second timeout. To increase the timeout duration to 90 seconds, set theironic::agent::rpc_response_timeoutvalue to18in the undercloud hieradata overrides file and re-run theopenstack undercloud installcommand:ironic::agent::rpc_response_timeout: 18
ironic::agent::rpc_response_timeout: 18Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Vendor passthrough timeout
When iDRAC is not available to execute vendor passthrough commands, these commands take too long and time out:
openstack baremetal node passthru call --http-method GET \ aed58dca-1b25-409a-a32f-3a817d59e1e0 list_unfinished_jobs Timed out waiting for a reply to message ID 547ce7995342418c99ef1ea4a0054572 (HTTP 500)
openstack baremetal node passthru call --http-method GET \ aed58dca-1b25-409a-a32f-3a817d59e1e0 list_unfinished_jobs Timed out waiting for a reply to message ID 547ce7995342418c99ef1ea4a0054572 (HTTP 500)Copy to Clipboard Copied! Toggle word wrap Toggle overflow To increase the timeout duration for messaging, increase the value of the
ironic::default::rpc_response_timeoutparameter in the undercloud hieradata overrides file and re-run theopenstack undercloud installcommand:ironic::default::rpc_response_timeout: 600
ironic::default::rpc_response_timeout: 600Copy to Clipboard Copied! Toggle word wrap Toggle overflow