11.3. Troubleshooting Overcloud Creation
There are three layers where the deployment can fail:
- Orchestration (heat and nova services)
- Bare Metal Provisioning (ironic service)
- Post-Deployment Configuration (Puppet)
If an Overcloud deployment has failed at any of these levels, use the OpenStack clients and service log files to diagnose the failed deployment.
11.3.1. Orchestration
In most cases, Heat shows the failed Overcloud stack after the Overcloud creation fails:
$ heat stack-list +-----------------------+------------+--------------------+----------------------+ | id | stack_name | stack_status | creation_time | +-----------------------+------------+--------------------+----------------------+ | 7e88af95-535c-4a55... | overcloud | CREATE_FAILED | 2015-04-06T17:57:16Z | +-----------------------+------------+--------------------+----------------------+
If the stack list is empty, this indicates an issue with the initial Heat setup. Check your Heat templates and configuration options, and check for any error messages that presented after running
openstack overcloud deploy
.
11.3.2. Bare Metal Provisioning
Check
ironic
to see all registered nodes and their current status:
$ ironic node-list +----------+------+---------------+-------------+-----------------+-------------+ | UUID | Name | Instance UUID | Power State | Provision State | Maintenance | +----------+------+---------------+-------------+-----------------+-------------+ | f1e261...| None | None | power off | available | False | | f0b8c1...| None | None | power off | available | False | +----------+------+---------------+-------------+-----------------+-------------+
Here are some common issues that arise from the provisioning process.
- Review the Provision State and Maintenance columns in the resulting table. Check for the following:
- An empty table, or fewer nodes than you expect
- Maintenance is set to True
- Provision State is set to
manageable
This usually indicates an issue with the registration or discovery processes. For example, if Maintenance sets itself to True automatically, the nodes are usually using the wrong power management credentials. - If Provision State is
available
, then the problem occurred before bare metal deployment has even started. - If Provision State is
active
and Power State ispower on
, the bare metal deployment has finished successfully. This means that the problem occurred during the post-deployment configuration step. - If Provision State is
wait call-back
for a node, the bare metal provisioning process has not yet finished for this node. Wait until this status changes, otherwise, connect to the virtual console of the failed node and check the output. - If Provision State is
error
ordeploy failed
, then bare metal provisioning has failed for this node. Check the bare metal node's details:$ ironic node-show [NODE UUID]
Look forlast_error
field, which contains error description. If the error message is vague, you can use logs to clarify it:$ sudo journalctl -u openstack-ironic-conductor -u openstack-ironic-api
- If you see
wait timeout error
and the node Power State ispower on
, connect to the virtual console of the failed node and check the output.
11.3.3. Post-Deployment Configuration
Many things can occur during the configuration stage. For example, a particular Puppet module could fail to complete due to an issue with the setup. This section provides a process to diagnose such issues.
Procedure 11.4. Diagnosing Post-Deployment Configuration Issues
- List all the resources from the Overcloud stack to see which one failed:
$ heat resource-list overcloud
This shows a table of all resources and their states. Look for any resources with aCREATE_FAILED
. - Show the failed resource:
$ heat resource-show overcloud [FAILED RESOURCE]
Check for any information in theresource_status_reason
field that can help your diagnosis. - Use the
nova
command to see the IP addresses of the Overcloud nodes.$ nova list
Log in as theheat-admin
user to one of the deployed nodes. For example, if the stack's resource list shows the error occurred on a Controller node, log in to a Controller node. Theheat-admin
user has sudo access.$ ssh heat-admin@192.0.2.14
- Check the
os-collect-config
log for a possible reason for the failure.$ sudo journalctl -u os-collect-config
- In some cases, nova fails deploying the node in entirety. This situation would be indicated by a failed
OS::Heat::ResourceGroup
for one of the Overcloud role types. Usenova
to see the failure in this case.$ nova list $ nova show [SERVER ID]
The most common error shown will reference the error messageNo valid host was found
. See Section 11.5, “Troubleshooting "No Valid Host Found" Errors” for details on troubleshooting this error. In other cases, look at the following log files for further troubleshooting:/var/log/nova/*
/var/log/heat/*
/var/log/ironic/*
- Use the SOS toolset, which gathers information about system hardware and configuration. Use this information for diagnostic purposes and debugging. SOS is commonly used to help support technicians and developers. SOS is useful on both the Undercloud and Overcloud. Install the
sos
package:$ sudo yum install sos
Generate a report:$ sudo sosreport --all-logs
The post-deployment process for Controller nodes uses six main steps for the deployment. This includes:
Step
|
Description
|
---|---|
ControllerLoadBalancerDeployment_Step1
|
Initial load balancing software configuration, including Pacemaker, RabbitMQ, Memcached, Redis, and Galera.
|
ControllerServicesBaseDeployment_Step2
|
Initial cluster configuration, including Pacemaker configuration, HAProxy, MongoDB, Galera, Ceph Monitor, and database initialization for OpenStack Platform services.
|
ControllerRingbuilderDeployment_Step3
|
Initial ring build for OpenStack Object Storage (
swift ).
|
ControllerOvercloudServicesDeployment_Step4
|
Configuration of all OpenStack Platform services (
nova , neutron , cinder , sahara , ceilometer , heat , horizon , aodh , gnocchi ).
|
ControllerOvercloudServicesDeployment_Step5
|
Configure service start up settings in Pacemaker, including constraints to determine service start up order and service start up parameters.
|
ControllerOvercloudServicesDeployment_Step6
|
Final pass of the Overcloud configuration.
|