Este conteúdo não está disponível no idioma selecionado.
Chapter 13. Troubleshooting the Red Hat Ansible Automation Platform Operator on OpenShift Container Platform
This guide provides a collection of commands and tips to help you diagnose and resolve common issues with your Ansible Automation Platform deployment on OpenShift Container Platform. You will learn how to view logs, inspect resources, and collect diagnostic data for support.
13.1. Understanding automation controller operator logs Copiar o linkLink copiado para a área de transferência!
When the operator deploys an Automation Controller instance, it runs an installer role inside the operator container. If the automation controller’s status is Failed
, you must check the automation-controller-operator
container logs. These logs provide the installer role’s output and are a critical first step in debugging deployment issues.
13.2. Viewing events in the OpenShift Container Platform Copiar o linkLink copiado para a área de transferência!
You can view events in the OpenShift Container Platform web console to monitor for errors and troubleshoot issues. This helps you quickly diagnose problems by examining the status of custom resources and their related events.
You can debug by first reviewing the status conditions of the Ansible Automation Platform custom resource (CR) and then checking any nested CRs for errors.
Procedure
- Log in to the OpenShift Container Platform web console.
-
In the navigation menu, select
. - Select your project from the project list.
- To view events for a specific resource, navigate to that resource’s page. Many resource pages, such as pods and deployments, have their own Events tab.
- Select a resource to bring you to the Pod Details page.
Verification
Check the Conditions section on the Pod details page to confirm no errors are listed in the Message column.
13.3. Viewing operator logs Copiar o linkLink copiado para a área de transferência!
The following procedure is an example of how to view the logs for an automation-controller-operator
pod.
Procedure
To find the pod name, run:
oc get pods | grep operator
oc get pods | grep operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow to view the logs for the pod, run:
oc logs <operator-pod-name> -f
oc logs <operator-pod-name> -f
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, to view the logs without first getting the pod name, run:
oc logs deployments/automation-controller-operator-controller-manager -c automation-controller-manager -f
oc logs deployments/automation-controller-operator-controller-manager -c automation-controller-manager -f
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
13.4. Configuring log verbosity Copiar o linkLink copiado para a área de transferência!
You can enable task output for debugging on any custom resources (CRs) by setting no_log
to false
in the spec
section of the CR.
The logs then show output for any failed tasks that originally had no_log
set to true
. The following procedure uses automation controller as an example, but every CR listed in the Core Ansible Automation Platform Resources section supports no_log
.
Procedure
Edit the automation controller CR and set the
no_log
field tofalse
in the spec.Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThis might expose sensitive data in the logs. On production clusters, this value must generally be set to
true
unless you are actively debugging an issue.To increase the Ansible Playbook verbosity from the operator, set the verbosity level using an annotation:
annotations: ansible.sdk.operatorframework.io/verbosity: "4"
annotations: ansible.sdk.operatorframework.io/verbosity: "4"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
13.5. Inspecting a OpenShift Container Platform resource Copiar o linkLink copiado para a área de transferência!
To inspect a OpenShift Container Platform resource, you must use the oc
command to get a summary or the full YAML definition of the resource.
Procedure
To view a human-readable summary of a resource, run:
oc describe -n <namespace> <resource> <resource-name>
oc describe -n <namespace> <resource> <resource-name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To view the complete YAML definition of a resource, use the
-o yaml
flag:oc get -n <namespace> <resource> <resource-name> -o yaml
oc get -n <namespace> <resource> <resource-name> -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example, to get the YAML for the
automationcontroller
custom resource, run:oc get -n aap automationcontroller aap -o yaml
oc get -n aap automationcontroller aap -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
13.6. Core Ansible Automation Platform resources Copiar o linkLink copiado para a área de transferência!
The following table lists and describes the core custom resources (CRs) that the Ansible Automation Platform Operator manages. Understanding these resources will help you with advanced troubleshooting and configuration.
Resource name | Description |
---|---|
| CR for deploying the entire Ansible Automation Platform. |
| CR for creating backups of the entire Ansible Automation Platform instance. |
| CR for restoring the entire Ansible Automation Platform instance from a backup. |
| CR defining the desired state of an automation controller instance. |
| CR for creating backups of automation controller data and configuration. |
| CR for restoring the automation controller from a backup. |
| CR for deploying an automation hub (Galaxy) instance. |
| CR for creating backups of automation hub data and configuration. |
| CR for restoring automation hub from a backup. |
| CR for deploying an Event-Driven Ansible (EDA) instance. |
| CR for creating backups of EDA data and configuration. |
| CR for restoring EDA from a backup. |
| CR for deploying an Red Hat Ansible Lightspeed instance. |
13.7. Standard Kubernetes resources Copiar o linkLink copiado para a área de transferência!
Standard Kubernetes resources are a core part of the OpenShift Container Platform. The following table describes the standard resources you can inspect to troubleshoot the state and configuration of an application.
Resource name | Description |
---|---|
| Smallest deployable unit containing one or more containers running the application workloads. |
| Manages pod configuration and scaling. |
| A PersistentVolumeClaim (PVC) is a request for storage resources, used for persistent data storage. |
| Exposes pods as network services with stable IP addresses and DNS names within the cluster. |
| Manages external HTTP and HTTPS access to services within the cluster. |
| An OpenShift-specific resource for exposing services externally (similar to an ingress). |
| Stores sensitive data like passwords, tokens, and certificates. |
| Provides identity for processes running in pods to access permissions to other Kubernetes resources. |
13.8. Discovering custom resource definition configuration parameters Copiar o linkLink copiado para a área de transferência!
The Ansible Automation Platform Operator manages multiple custom resources (CRs), each with its own configuration parameters. Use the oc explain
command to discover all available configuration options for the AnsibleAutomationPlatform
CR and its nested components.
Procedure
To see all available configuration parameters for a top-level CR, run:
oc explain ansibleautomationplatform.spec
oc explain ansibleautomationplatform.spec
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To view specific nested sections, query them directly:
oc explain automationcontroller.spec.postgres_configuration_secret oc explain automationcontroller.spec.route_tls_termination_mechanism
oc explain automationcontroller.spec.postgres_configuration_secret oc explain automationcontroller.spec.route_tls_termination_mechanism
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To explore all nested fields at once, use the
--recursive
flag:oc explain automationcontroller.spec --recursive
oc explain automationcontroller.spec --recursive
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
13.9. Collecting Diagnostic Data Copiar o linkLink copiado para a área de transferência!
Use the oc adm must-gather
command to collect comprehensive diagnostic data about your cluster and the Ansible Automation Platform components. This data is essential when contacting Red Hat Support.
Procedure
To start the
must-gather
tool, run:oc adm must-gather --image=registry.redhat.io/ansible-automation-platform-25/aap-must-gather-rhel8
oc adm must-gather --image=registry.redhat.io/ansible-automation-platform-25/aap-must-gather-rhel8
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View the collected data, use the
omc
tool to query themust-gather
tarball as if it were a live cluster.omc use <path-to-must-gather> omc get pods
omc use <path-to-must-gather> omc get pods
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
13.10. Debugging crashing pods Copiar o linkLink copiado para a área de transferência!
If a pod is failing or crashing, use the oc debug
command. This command creates a new pod with the same configuration and mounts as the pod you specified, allowing you to access it for debugging.
Procedure
To connect to the pod, run:
oc debug <pod-name>
oc debug <pod-name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
13.11. Operator service account error Copiar o linkLink copiado para a área de transferência!
Manually modifying the aap_operator_service_account
user in the Ansible Automation Platform database or UI removes the required is_superuser
flag. This action causes a critical failure in the platform gateway operator’s reconciliation loop.
You see the following error:
TASK [ansibleautomationplatform : Create operator service account user] … CommandError: Error: That username is already taken
TASK [ansibleautomationplatform : Create operator service account user] … CommandError: Error: That username is already taken
The Ansible Automation Platform operator automatically recreates the service account when the account is missing. To restore the required superuser privileges, you must remove the existing, incorrectly configured user.
After you delete the user, the platform gateway operator automatically runs its idempotency logic, recreates the account, and ensures it has the necessary is_superuser=True
flag, restoring the reconciliation loop’s functionality.