Troubleshooting Ansible Automation Platform
Troubleshoot issues with Ansible Automation Platform
Abstract
Preface Copy linkLink copied to clipboard!
Use the Troubleshooting Ansible Automation Platform guide to troubleshoot your Ansible Automation Platform installation.
Providing feedback on Red Hat documentation Copy linkLink copied to clipboard!
If you have a suggestion to improve this documentation, or find an error, you can contact technical support at https://access.redhat.com to open a request.
Chapter 1. Diagnosing the problem Copy linkLink copied to clipboard!
To start troubleshooting Ansible Automation Platform, use the must-gather command on OpenShift Container Platform or the sos utility on a VM-based installation to collect configuration and diagnostic information. You can attach the output of these utilities to your support case.
1.1. Troubleshooting Ansible Automation Platform on OpenShift Container Platform by using the must-gather command Copy linkLink copied to clipboard!
The oc adm must-gather command line interface (CLI) command collects information from your Ansible Automation Platform installation deployed on OpenShift Container Platform. It gathers information that is often needed for debugging issues, including resource definitions and service logs.
Running the oc adm must-gather CLI command creates a new directory containing the collected data that you can use to troubleshoot or attach to your support case.
If your OpenShift environment does not have access to registry.redhat.io and you cannot run the must-gather command, then run the oc adm inspect command instead.
Prerequisites
-
The OpenShift CLI (
oc) is installed.
Procedure
Log in to your cluster:
oc login <openshift_url>
oc login <openshift_url>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run one of the following commands based on your level of access in the cluster:
Run
must-gatheracross the entire cluster:oc adm must-gather --image=registry.redhat.io/ansible-automation-platform-24/aap-must-gather-rhel8 --dest-dir <dest_dir>
oc adm must-gather --image=registry.redhat.io/ansible-automation-platform-24/aap-must-gather-rhel8 --dest-dir <dest_dir>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
--imagespecifies the image that gathers data -
--dest-dirspecifies the directory for the output
-
Run
must-gatherfor a specific namespace in the cluster:oc adm must-gather --image=registry.redhat.io/ansible-automation-platform-24/aap-must-gather-rhel8 --dest-dir <dest_dir> – /usr/bin/ns-gather <namespace>
oc adm must-gather --image=registry.redhat.io/ansible-automation-platform-24/aap-must-gather-rhel8 --dest-dir <dest_dir> – /usr/bin/ns-gather <namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
– /usr/bin/ns-gatherlimits themust-gatherdata collection to a specified namespace
-
To attach the
must-gatherarchive to your support case, create a compressed file from themust-gatherdirectory created before and attach it to your support case.For example, on a computer that uses a Linux operating system, run the following command, replacing
<must-gather-local.5421342344627712289/>with themust-gatherdirectory name:tar cvaf must-gather.tar.gz <must-gather.local.5421342344627712289/>
$ tar cvaf must-gather.tar.gz <must-gather.local.5421342344627712289/>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
1.2. Troubleshooting Ansible Automation Platform on VM-based installations by generating an sos report Copy linkLink copied to clipboard!
The sos utility collects configuration, diagnostic, and troubleshooting data from your Ansible Automation Platform on a VM-based installation.
For more information about installing and using the sos utility, see Generating an sos report for technical support.
Chapter 2. Resources for troubleshooting automation controller Copy linkLink copied to clipboard!
- For information about troubleshooting automation controller, see Troubleshooting automation controller in the Automation Controller Administration Guide.
- For information about troubleshooting the performance of automation controller, see Performance troubleshooting for automation controller in the Automation Controller Administration Guide.
Chapter 3. Backup and recovery Copy linkLink copied to clipboard!
- For information about performing a backup and recovery of Ansible Automation Platform, see Backup and restore in the Automation Controller Administration Guide.
- For information about troubleshooting backup and recovery for installations of Ansible Automation Platform Operator on OpenShift Container Platform, see the Troubleshooting section in Red Hat Ansible Automation Platform operator backup and recovery guide.
Chapter 4. Execution environments Copy linkLink copied to clipboard!
Troubleshoot issues with execution environments.
4.1. Issue - Cannot select the "Use in Controller" option for execution environment image on private automation hub Copy linkLink copied to clipboard!
You cannot use the Use in Controller option for an execution environment image on private automation hub. You also receive the error message: “No Controllers available”.
To resolve this issue, connect automation controller to your private automation hub instance.
Procedure
Change the
/etc/pulp/settings.pyfile on private automation hub and add one of the following parameters depending on your configuration:Single controller
CONNECTED_ANSIBLE_CONTROLLERS = ['<https://my.controller.node>']
CONNECTED_ANSIBLE_CONTROLLERS = ['<https://my.controller.node>']Copy to Clipboard Copied! Toggle word wrap Toggle overflow Many controllers behind a load balancer
CONNECTED_ANSIBLE_CONTROLLERS = ['<https://my.controller.loadbalancer>']
CONNECTED_ANSIBLE_CONTROLLERS = ['<https://my.controller.loadbalancer>']Copy to Clipboard Copied! Toggle word wrap Toggle overflow Many controllers without a load balancer
CONNECTED_ANSIBLE_CONTROLLERS = ['<https://my.controller.node1>', '<https://my.controller2.node2>']
CONNECTED_ANSIBLE_CONTROLLERS = ['<https://my.controller.node1>', '<https://my.controller2.node2>']Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Stop all of the private automation hub services:
systemctl stop pulpcore.service pulpcore-api.service pulpcore-content.service pulpcore-worker@1.service pulpcore-worker@2.service nginx.service redis.service
# systemctl stop pulpcore.service pulpcore-api.service pulpcore-content.service pulpcore-worker@1.service pulpcore-worker@2.service nginx.service redis.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Restart all of the private automation hub services:
systemctl start pulpcore.service pulpcore-api.service pulpcore-content.service pulpcore-worker@1.service pulpcore-worker@2.service nginx.service redis.service
# systemctl start pulpcore.service pulpcore-api.service pulpcore-content.service pulpcore-worker@1.service pulpcore-worker@2.service nginx.service redis.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
- Verify that you can now use the Use in Controller option in private automation hub.
Chapter 5. Installation Copy linkLink copied to clipboard!
Troubleshoot issues with your installation.
5.1. Issue - Cannot locate certain packages that come bundled with the Ansible Automation Platform installer Copy linkLink copied to clipboard!
You cannot locate certain packages that come bundled with the Ansible Automation Platform installer, or you are seeing a "Repositories disabled by configuration" message.
To resolve this issue, enable the repository by using the subscription-manager command in the command line. For more information about resolving this issue, see the Troubleshooting section of Attaching your Red Hat Ansible Automation Platform subscription in the Red Hat Ansible Automation Platform Planning Guide.
Chapter 6. Jobs Copy linkLink copied to clipboard!
Troubleshoot issues with jobs.
6.1. Issue - Jobs are failing when run against localhost Copy linkLink copied to clipboard!
With Ansible Automation Platform 2 and its containerized execution environments, the usage of localhost has changed. For more information, see Converting playbooks for AAP 2 in the Red Hat Ansible Automation Platform Upgrade and Migration Guide.
6.2. Issue - Jobs are failing with “ERROR! couldn’t resolve module/action” error message Copy linkLink copied to clipboard!
Jobs are failing with the error message “ERROR! couldn’t resolve module/action 'module name'. This often indicates a misspelling, missing collection, or incorrect module path”.
This error can happen when the collection associated with the module is missing from the execution environment.
The recommended resolution is to create a custom execution environment and add the required collections inside of that execution environment. For more information about creating an execution environment, see Using Ansible Builder in Creating and Consuming Execution Environments.
Alternatively, you can complete the following steps:
Procedure
-
Create a
collectionsfolder inside of the project repository. Add a
requirements.ymlfile inside of thecollectionsfolder and add the collection:collections: - <collection_name>
collections: - <collection_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.3. Issue - Jobs are failing with “Timeout (12s) waiting for privilege escalation prompt” error message Copy linkLink copied to clipboard!
This error can happen when the timeout value is too small, causing the job to stop before completion. The default timeout value for connection plugins is 10.
To resolve the issue, increase the timeout value by completing one of the following procedures.
The following changes will affect all of the jobs in automation controller. To use a timeout value for a specific project, add an ansible.cfg file in the root of the project directory and add the timeout parameter value to that ansible.cfg file.
Add ANSIBLE_TIMEOUT as an environment variable in the automation controller UI
- Go to automation controller.
- From the navigation panel, select → .
Under Extra Environment Variables add the following:
{ "ANSIBLE_TIMEOUT": 60 }{ "ANSIBLE_TIMEOUT": 60 }Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Add a timeout value in the [defaults] section of the ansible.cfg file by using the CLI
Edit the
/etc/ansible/ansible.cfgfile and add the following:[defaults] timeout = 60
[defaults] timeout = 60Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Running ad hoc commands with a timeout
To run an ad hoc playbook in the command line, add the
--timeoutflag to theansible-playbookcommand, for example:ansible-playbook --timeout=60 <your_playbook.yml>
# ansible-playbook --timeout=60 <your_playbook.yml>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.4. Issue - Jobs in automation controller are stuck in a pending state Copy linkLink copied to clipboard!
After launching jobs in automation controller, the jobs stay in a pending state and do not start.
There are a few reasons jobs can become stuck in a pending state. For more information about troubleshooting this issue, see Playbook stays in pending in the Automation Controller Administration Guide.
Cancel all pending jobs
Run the following commands to list all of the pending jobs:
awx-manage shell_plus
# awx-manage shell_plusCopy to Clipboard Copied! Toggle word wrap Toggle overflow >>> UnifiedJob.objects.filter(status='pending')
>>> UnifiedJob.objects.filter(status='pending')Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the following command to cancel all of the pending jobs:
>>> UnifiedJob.objects.filter(status='pending').update(status='canceled')
>>> UnifiedJob.objects.filter(status='pending').update(status='canceled')Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Cancel a single job by using a job id
To cancel a specific job, run the following commands, replacing
<job_id>with the job id to cancel:awx-manage shell_plus
# awx-manage shell_plusCopy to Clipboard Copied! Toggle word wrap Toggle overflow >>> UnifiedJob.objects.filter(id=_<job_id>_).update(status='canceled')
>>> UnifiedJob.objects.filter(id=_<job_id>_).update(status='canceled')Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Jobs are failing with the error message "denied: requested access to the resource is denied, unauthorized: Insufficient permissions" when using an execution environment in private automation hub.
This issue happens when your private automation hub is protected with a password or token and the registry credential is not assigned to the execution environment.
Procedure
- Go to automation controller.
- From the navigation panel, select → .
- Click the execution environment assigned to the job template that is failing.
- Click .
- Assign the appropriate Registry credential from your private automation hub to the execution environment.
Chapter 7. Login Copy linkLink copied to clipboard!
Troubleshoot login issues.
7.1. Issue - Logging in to the automation controller UI results in “Invalid username or password. Please try again.” Copy linkLink copied to clipboard!
When you try to log in to the automation controller UI, the login fails and you see the error message: “Invalid username or password. Please try again.”.
One reason this could be happening is if the value for Maximum number of simultaneous logged in sessions is 0. The Maximum number of simultaneous logged in sessions value determines the maximum number of sessions allowed per user per device. If this value is 0, no users can log in to automation controller.
The default value is -1, which disables the maximum sessions allowed. This means that you can have as many sessions without an imposed limit.
Procedure
As root user, run the following command from the command line to set the
SESSIONS_PER_USERvariable to-1which disables the maximum sessions allowed:echo "settings.SESSIONS_PER_USER = -1" | awx-manage shell_plus --quiet
# echo "settings.SESSIONS_PER_USER = -1" | awx-manage shell_plus --quietCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
- Verify that you can log in successfully to automation controller.
Chapter 8. Networking Copy linkLink copied to clipboard!
Troubleshoot networking issues.
8.1. Issue - The default subnet used in Ansible Automation Platform containers conflicts with the internal network Copy linkLink copied to clipboard!
The default subnet used in Ansible Automation Platform containers conflicts with the internal network resulting in "No route to host" errors.
To resolve this issue, update the default classless inter-domain routing (CIDR) value so it does not conflict with the CIDR used by the default Podman networking plugin.
Procedure
In all controller and hybrid nodes, run the following commands to create a file called
custom.py:touch /etc/tower/conf.d/custom.py
# touch /etc/tower/conf.d/custom.pyCopy to Clipboard Copied! Toggle word wrap Toggle overflow chmod 640 /etc/tower/conf.d/custom.py
# chmod 640 /etc/tower/conf.d/custom.pyCopy to Clipboard Copied! Toggle word wrap Toggle overflow chown root:awx /etc/tower/conf.d/custom.py
# chown root:awx /etc/tower/conf.d/custom.pyCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the following to the
/etc/tower/conf.d/custom.pyfile:DEFAULT_CONTAINER_RUN_OPTIONS = ['--network', 'slirp4netns:enable_ipv6=true,cidr=192.0.2.0/24']
DEFAULT_CONTAINER_RUN_OPTIONS = ['--network', 'slirp4netns:enable_ipv6=true,cidr=192.0.2.0/24']Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
192.0.2.0/24is the value for the new CIDR in this example.
-
Stop and start the automation controller service in all controller and hybrid nodes:
automation-controller-service stop
# automation-controller-service stopCopy to Clipboard Copied! Toggle word wrap Toggle overflow automation-controller-service start
# automation-controller-service startCopy to Clipboard Copied! Toggle word wrap Toggle overflow All containers will start on the new CIDR.
Chapter 9. Playbooks Copy linkLink copied to clipboard!
You can use automation content navigator to interactively troubleshoot your playbook. For more information about troubleshooting a playbook with automation content navigator, see Troubleshooting Ansible content with automation content navigator in the Automation content navigator creator guide Guide.
Chapter 10. Subscriptions Copy linkLink copied to clipboard!
For information about keeping your automation controller subscription in compliance, see Troubleshooting: Keep your subscription in compliance in the Automation Controller User Guide.