Chapter 4. Configuring Ansible automation controller on OpenShift Container Platform
During a Kubernetes upgrade, automation controller must be running.
4.1. Minimizing downtime during OpenShift Container Platform upgrade Copy linkLink copied to clipboard!
Make the following configuration changes in automation controller to minimize downtime during the upgrade.
Prerequisites
- Ansible Automation Platform 2.4 or later
- Ansible automation controller 4.4 or later
OpenShift Container Platform:
- Later than 4.10.42
- Later than 4.11.16
- Later than 4.12.0
- High availability (HA) deployment of Postgres
- Multiple worker nodes that automation controller pods can be scheduled on
Procedure
Enable
RECEPTOR_KUBE_SUPPORT_RECONNECTin AutomationController specification:apiVersion: automationcontroller.ansible.com/v1beta1 kind: AutomationController metadata: ... spec: ... ee_extra_env: | - name: RECEPTOR_KUBE_SUPPORT_RECONNECT value: enabled ```Enable the graceful termination feature in AutomationController specification:
termination_grace_period_seconds: <time to wait for job to finish>Configure
podAntiAffinityfor web and task the pod to spread out the deployment in AutomationController specification:task_affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: app.kubernetes.io/name operator: In values: - awx-task topologyKey: topology.kubernetes.io/zone weight: 100 web_affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: app.kubernetes.io/name operator: In values: - awx-web topologyKey: topology.kubernetes.io/zone weight: 100Configure
PodDisruptionBudgetin OpenShift Container Platform:--- apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: automationcontroller-job-pods spec: maxUnavailable: 0 selector: matchExpressions: - key: ansible-awx-job-id operator: Exists --- apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: automationcontroller-web-pods spec: minAvailable: 1 selector: matchExpressions: - key: app.kubernetes.io/name operator: In values: - <automationcontroller_instance_name>-web --- apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: automationcontroller-task-pods spec: minAvailable: 1 selector: matchExpressions: - key: app.kubernetes.io/name operator: In values: - <automationcontroller_instance_name>-task
4.2. Receptor Kubernetes retry count variable Copy linkLink copied to clipboard!
Configure the Receptor worker within the Ansible Automation Platform Operator through the RECEPTOR_KUBE_RETRY_COUNT environment variable. This variable controls how the worker handles Kubernetes API connection failures.
The retry mechanism uses an exponential backoff strategy which is capped at 5 minutes to prevent excessive wait times during job execution errors.
RECEPTOR_KUBE_RETRY_COUNT details
| Variable | Description | Default Value | Valid Range |
|---|---|---|---|
|
| Sets the maximum number of retry attempts for Kubernetes API operations within the Receptor worker. Retry delays increase using exponential backoff with a Fibonacci-like sequence. |
|
|
Configuration recommendations
If a playbook is expected to run for more than 20 hours, or if it remains without output for more than 2 to 4 hours, increase RECEPTOR_KUBE_RETRY_COUNT from the default value. For a 20-hour job, a retry count of 10 is recommended. This setting allows approximately 2 hours for each retry attempt, ensuring the worker does not time out prematurely during long-running operations.