Este conteúdo não está disponível no idioma selecionado.
Chapter 8. Rulebook activations troubleshooting
Rulebook activations might occasionally fail due to various reasons. While many issues can be resolved through basic checks, diagnosing failures across a distributed system requires robust logging.
Event-Driven Ansible’s enhanced logging strategy includes the addition of unique tracking identifiers to all output to significantly improve troubleshooting.
Review the list of possible issues contained in this chapter that can cause activation failures and suggestions on how you can resolve them. For detailed log filtering using the new identifiers, see Event-Driven Ansible log filtering.
8.1. Event-Driven Ansible log filtering Copiar o linkLink copiado para a área de transferência!
Event-Driven Ansible includes tracking identifiers in all log output to significantly improve troubleshooting. These identifiers help track user actions and activation processes across multiple services and log files.
| Identifier | Abbreviation | Purpose | Location |
|---|---|---|---|
| X-REQUEST-ID |
| Tracks HTTP requests from the platform gateway through the entire Event-Driven Ansible request lifecycle. Use this to correlate UI actions or API calls with backend processing. | Included in the HTTP response headers and Event-Driven Ansible log entries. |
| Log Tracking ID |
| Tracks the activation lifecycle from creation through completion, persisting across restarts and multiple log files. | Included in all activation-related log entries. It can be obtained from the activation History tab in the UI. |
| Activation Instance ID |
|
Identifies the logs specific to a single execution instance of a rulebook activation, allowing you to view | Included in activation logs. |
Not all processes originate from a user or external client. When an Event-Driven Ansible orchestrator internally triggers a process (for example, a monitor request), the rid UUID is generated internally to track that process lifecycle and will not be present in the platform gateway logs.
The enhanced log format places these identifiers at the start of the message, making them easy to filter:
[rid: <UUID>] [tid: <UUID>] [aiid: <ID>] aap_eda.tasks.orchestrator Processing request…
8.1.1. Using log filtering for troubleshooting Copiar o linkLink copiado para a área de transferência!
Learn to utilize unique tracking identifiers to efficiently search and filter all system logs. This process pinpoints the exact timeline and cause of failure for a specific rulebook activation or API request.
Procedure
Collect identifiers:
-
When an issue occurs, retrieve the Log Tracking ID (
tid) from the failed activation instance’s logs in the UI History tab. -
If the issue was triggered by a user action (like restarting an activation), obtain the X-REQUEST-ID (
rid) from the HTTP response headers.
-
When an issue occurs, retrieve the Log Tracking ID (
Search system logs:
- Use the collected UUID to search through your backend logs (worker, scheduler, API, and the like.). This filters out irrelevant noise, allowing you to focus on the full timeline of the specific request or activation across all services.
Correlate timeline:
-
Use the common
tidto follow the activation’s progress (or failure) across different log files and services.
-
Use the common
Use support tools:
-
If necessary, use
sosreportormustgathertools, which automatically collect all relevant Event-Driven Ansible logs from/var/log/ansible-automation-platform/eda/.
-
If necessary, use
8.2. Activation stuck in Pending state Copiar o linkLink copiado para a área de transferência!
Perform the following steps if your rulebook activation is stuck in Pending state.
Procedure
Confirm whether there are other running activations and if you have reached the limits (for example, memory or CPU limits).
- If there are other activations running, terminate one or more of them, if possible.
- If not, check that the default worker, Redis, and activation worker are all running.
- If all systems are working as expected, check your eda-server internal logs in the worker, scheduler, API, and nginx containers and services to see if the problem can be determined. These logs reveal the source of the issue, such as an exception thrown by the code, a runtime error with network issues, or an error with the rulebook code. If your internal logs do not provide information that leads to resolution, report the issue to Red Hat support.
8.3. Activation keeps restarting Copiar o linkLink copiado para a área de transferência!
Perform the following steps if your rulebook activation keeps restarting.
Procedure
- Log in to Ansible Automation Platform.
-
From the navigation panel, select
. - From the Rulebook Activations page, select the activation in your list that keeps restarting. The Details page is displayed.
- Click the History tab for more information and select the rulebook activation that keeps restarting. The Details tab is displayed and shows the output information.
Check the Restart policy field for your activation.
There are three selections available: On failure (restarts a rulebook activation when the container process fails), Always (always restarts regardless of success or failure with no more than 5 restarts), or Never (never restarts when the container process ends).
- Confirm that your rulebook activation Restart policy is set to On failure. This is an indication that an issue is causing it to fail.
- To possibly diagnose the problem, check the YAML code and the instance logs of the rulebook activation for errors.
- If you cannot find a solution with the restart policy values, proceed to the next steps related to the Log level.
Check your log level for your activation.
- If your default log level is Error, go back to the Rulebook Activation page and recreate your activation following procedures in Setting up rulebook a activation.
- Change the Log level to Debug.
- Run the activation again and navigate to the History tab from the activation details page.
- On the History page, click one of your recent activations and view the Output.
8.4. Events are not being processed by rulebook activation Copiar o linkLink copiado para a área de transferência!
If your rulebook activation is running but not processing events, the most common cause is a mismatch between the expected event source and the source defined in the rulebook.
Procedure
- Check the rulebook source: Review the source plugin defined in your rulebook YAML (for example, ansible.eda.webhook, ansible.eda.kafka).
- Verify event input: Confirm that the events you are sending to Event-Driven Ansible controller are compatible with the source plugin defined in the rulebook. If the rulebook expects a Kafka message, it cannot process a generic webhook event.
- Confirm activation mapping: If you are using event streams, ensure the correct event stream is mapped to the rulebook during the activation setup. A mismatch here will result in the activation receiving no data.
8.5. Actions not triggering despite receiving events Copiar o linkLink copiado para a área de transferência!
If your rulebook activation is Running and successfully receiving events, but no actions are being executed, the issue is likely within the logic of your rulebook.
Procedure
- Check rule conditions: Review the rulebook YAML to confirm that the conditions (the when statements) are accurately written and precisely match the structure and values of the incoming event payload.
- Verify indentation and syntax: Ensure all rulebook syntax and indentation are correct, as a simple error can prevent the rule engine from evaluating conditions.
-
Validate actions: Confirm that the specified action is a recognized and correctly configured action (for example,
run_job_templatewith the proper arguments).
8.6. Event streams not sending events to activation Copiar o linkLink copiado para a área de transferência!
If you are using event streams to send events to your rulebook activations, occasionally those events might not be successfully routed to your rulebook activation.
Procedure
Try the following options to resolve this.
- Ensure that each of your event streams in Event-Driven Ansible controller is not in Test mode . This means activations would not receive the events.
- Verify that the origin service is sending the request properly.
- Check that the network connection to your platform gateway instance is stable. If you have set up event streams, this is the entry of the event stream request from the sender.
- Verify that the proxy in the platform gateway is running.
- Confirm that the event stream worker is up and running, and able to process the request.
- Verify that your credential is correctly set up in the event stream.
Confirm that the request complies with the authentication mechanism determined by the set credential (for example, basic must contain a header with the credentials or HMAC must contain the signature of the content in a header, and similar).
NoteThe credentials might have been changed in Event-Driven Ansible controller, but not updated in the origin service.
- Verify that the rulebook that is running in the activation reacts to these events. This would indicate that you wrote down the event source and added actions that consume the events coming in. Otherwise, the event does reach the activation but there is nothing to activate it.
- If you are using self-signed certificates, you might want to disable certificate validation when sending webhooks from vendors. Most of the vendors have an option to disable certificate validation for testing or non-production environments.
8.7. Cannot connect to the 2.5 automation controller when running activations Copiar o linkLink copiado para a área de transferência!
You might experience a failed connection to automation controller when you run your activations.
Procedure
To help resolve the issue, confirm that you have set up a Red Hat Ansible Automation Platform credential and have obtained the correct automation controller URL.
- If you have not set up a Red Hat Ansible Automation Platform credential, follow the procedures in Setting up a Red Hat Ansible Automation Platform credential. Ensure that this credential has the host set to the following URL format: https://<your_gateway>/api/controller
- When you have completed this process, try setting up your rulebook activation again.