Troubleshoot failed event-driven automation triggers
Rulebook activations might occasionally fail due to various reasons. While many issues can be resolved through basic checks, diagnosing failures across a distributed system requires robust logging.
Event-Driven Ansible’s enhanced logging strategy includes the addition of unique tracking identifiers to all output to significantly improve troubleshooting.
Review the list of possible issues contained in this chapter that can cause activation failures and suggestions on how you can resolve them. For detailed log filtering using the new identifiers, see Event-Driven Ansible log filtering.
Event-Driven Ansible log filtering Copy linkLink copied!
Event-Driven Ansible includes tracking identifiers in all log output to significantly improve troubleshooting. These identifiers help track user actions and activation processes across multiple services and log files.
| Identifier | Abbreviation | Purpose | Location |
|---|---|---|---|
| X-REQUEST-ID |
|
Tracks HTTP requests from the platform gateway through the entire Event-Driven Ansible request lifecycle. Use this to correlate UI actions or API calls with backend processing. |
Included in the HTTP response headers and Event-Driven Ansible log entries. |
| Log Tracking ID |
|
Tracks the activation lifecycle from creation through completion, persisting across restarts and multiple log files. |
Included in all activation-related log entries. It can be obtained from the activation History tab in the UI. |
| Activation Instance ID |
|
Identifies the logs specific to a single execution instance of a rulebook activation, allowing you to view |
Included in activation logs. |
Not all processes originate from a user or external client. When an Event-Driven Ansible orchestrator internally triggers a process (for example, a monitor request), the rid UUID is generated internally to track that process lifecycle and will not be present in the platform gateway logs.
The enhanced log format places these identifiers at the start of the message, making them easy to filter:
`[rid: <UUID>] [tid: <UUID>] [aiid: <ID>] aap_eda.tasks.orchestrator Processing request...`
Use log filtering for troubleshooting Copy linkLink copied!
Learn to filter logs using specialized tracking identifiers for efficient troubleshooting of activation issues and API request lifecycles.
Procedure Copy linkLink copied!
- Collect identifiers:
- When an issue occurs, retrieve the Log Tracking ID (
tid) from the failed activation instance’s logs in the UI History tab. - If the issue was triggered by a user action (like restarting an activation), obtain the X-REQUEST-ID (
rid) from the HTTP response headers.
- When an issue occurs, retrieve the Log Tracking ID (
- Search system logs:
- Use the collected UUID to search through your backend logs (worker, scheduler, API, and the like.). This filters out irrelevant noise, allowing you to focus on the full timeline of the specific request or activation across all services.
- Correlate timeline:
- Use the common
tidto follow the activation’s progress (or failure) across different log files and services.
- Use the common
- Use support tools:
- If necessary, use
sosreportormustgathertools, which automatically collect all relevant Event-Driven Ansible logs from/var/log/ansible-automation-platform/eda/.
- If necessary, use
Resolve rulebook activations stuck in pending state Copy linkLink copied!
Diagnose and resolve issues preventing a rulebook activation from transitioning from Pending to a running, operational state.
Procedure Copy linkLink copied!
- If there are other activations running, terminate one or more of them, if possible.
- If not, check that the default worker, Redis, and activation worker are all running.
- If all systems are working as expected, check your eda-server internal logs in the worker, scheduler, API, and nginx containers and services to see if the problem can be determined. These logs reveal the source of the issue, such as an exception thrown by the code, a runtime error with network issues, or an error with the rulebook code. If your internal logs do not provide information that leads to resolution, report the issue to Red Hat support.
Fix rulebook activations stuck in a restart loop Copy linkLink copied!
Troubleshoot rulebook activations that restart repeatedly (indicating persistent errors) to diagnose and fix core issues preventing stable, continuous automation execution.
Procedure Copy linkLink copied!
Resolve event processing failures in rulebook activations Copy linkLink copied!
Troubleshoot why a running rulebook activation is failing to process events, focusing on common causes like source definition mismatches or internal processing errors.
Procedure Copy linkLink copied!
- Check the rulebook source: Review the source plugin defined in your rulebook YAML (for example, ansible.eda.webhook, ansible.eda.kafka).
- Verify event input: Confirm that the events you are sending to Event-Driven Ansible controller are compatible with the source plugin defined in the rulebook. If the rulebook expects a Kafka message, it cannot process a generic webhook event.
- Confirm activation mapping: If you are using event streams, ensure the correct event stream is mapped to the rulebook during the activation setup. A mismatch here will result in the activation receiving no data.
Troubleshoot actions that fail to trigger after receiving events Copy linkLink copied!
If your rulebook activation is Running and successfully receiving events, but no actions are being executed, the issue is likely within the logic of your rulebook.
Procedure Copy linkLink copied!
- Check rule conditions: Review the rulebook YAML to confirm that the conditions (the when statements) are accurately written and precisely match the structure and values of the incoming event payload.
- Verify indentation and syntax: Ensure all rulebook syntax and indentation are correct, as a simple error can prevent the rule engine from evaluating conditions.
- Validate actions: Confirm that the specified action is a recognized and correctly configured action (for example,
run_job_templatewith the proper arguments).
Troubleshoot event streams that fail to send events Copy linkLink copied!
Diagnose issues where an event stream is receiving data but failing to forward it, ensuring proper connectivity and correct credential setup.