Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 3. Specifying dedicated nodes
A Kubernetes cluster runs on top of many Virtual Machines or nodes (generally anywhere between 2 and 20 nodes). Pods can be scheduled on any of these nodes. When you create or schedule a new pod, use the topology_spread_constraints
setting to configure how new pods are distributed across the underlying nodes when scheduled or created.
Do not schedule your pods on a single node, because if that node fails, the services that those pods provide also fails.
Schedule the control plane nodes to run on different nodes to the automation job pods. If the control plane pods share nodes with the job pods, the control plane can become resource starved and degrade the performance of the whole application.
3.1. Assigning pods to specific nodes
You can constrain the automation controller pods created by the operator to run on a certain subset of nodes.
-
node_selector
andpostgres_selector
constrain the automation controller pods to run only on the nodes that match all the specified key, or value, pairs. -
tolerations
andpostgres_tolerations
enable the automation controller pods to be scheduled onto nodes with matching taints. See Taints and Toleration in the Kubernetes documentation for further details.
The following table shows the settings and fields that can be set on the automation controller’s specification section of the YAML (or using the OpenShift UI form).
Name | Description | Default |
---|---|---|
| Path of the image to pull | postgres |
| Image version to pull | 13 |
| AutomationController pods’ nodeSelector | “”’’ |
| AutomationController pods’ topologySpreadConstraints | “”’’ |
| AutomationController pods’ tolerations | “”’’ |
| AutomationController pods’ annotations | “”’’ |
| Postgres pods’ nodeSelector | “”’’ |
| Postgres pods’ tolerations | “”’’ |
topology_spread_constraints
can help optimize spreading your control plane pods across the compute nodes that match your node selector. For example, with the maxSkew
parameter of this option set to 100
, this means maximally spread across available nodes. So if there are three matching compute nodes and three pods, one pod will be assigned to each compute node. This parameter helps prevent the control plane pods from competing for resources with each other.
Example of a custom configuration for constraining controller pods to specific nodes
spec: ... node_selector: | disktype: ssd kubernetes.io/arch: amd64 kubernetes.io/os: linux topology_spread_constraints: | - maxSkew: 100 topologyKey: "topology.kubernetes.io/zone" whenUnsatisfiable: "ScheduleAnyway" labelSelector: matchLabels: app.kubernetes.io/name: "<resourcename>" tolerations: | - key: "dedicated" operator: "Equal" value: "AutomationController" effect: "NoSchedule" postgres_selector: | disktype: ssd kubernetes.io/arch: amd64 kubernetes.io/os: linux postgres_tolerations: | - key: "dedicated" operator: "Equal" value: "AutomationController" effect: "NoSchedule"
3.2. Specify nodes for job execution
You can add a node selector to the container group pod specification to ensure they only run against certain nodes. First add a label to the nodes you want to run jobs against.
The following procedure adds a label to a node.
Procedure
List the nodes in your cluster, along with their labels:
kubectl get nodes --show-labels
The output is similar to this (shown here in a table):
Name Status Roles Age Version Labels worker0
Ready
<none>
1d
v1.13.0
…,kubernetes.io/hostname=worker0
worker1
Ready
<none>
1d
v1.13.0
…,kubernetes.io/hostname=worker1
worker2
Ready
<none>
1d
v1.13.0
…,kubernetes.io/hostname=worker2
Choose one of your nodes, and add a label to it by using the following command:
kubectl label nodes <your-node-name> <aap_node_type>=<execution>
For example:
kubectl label nodes <your-node-name> disktype=ssd
where
<your-node-name>
is the name of your chosen node.Verify that your chosen node has a
disktype=ssd
label:kubectl get nodes --show-labels
The output is similar to this (shown here in a table):
Name Status Roles Age Version Labels worker0
Ready
<none>
1d
v1.13.0
…disktype=ssd,kubernetes.io/hostname=worker0
worker1
Ready
<none>
1d
v1.13.0
…,kubernetes.io/hostname=worker1
worker2
Ready
<none>
1d
v1.13.0
…,kubernetes.io/hostname=worker2
You can see that the
worker0
node now has adisktype=ssd
label.- In the automation controller UI, specify that label in the metadata section of your customized pod specification in the container group.
apiVersion: v1 kind: Pod metadata: disktype: ssd namespace: ansible-automation-platform spec: serviceAccountName: default automountServiceAccountToken: false nodeSelector: aap_node_type: execution containers: - image: >- registry.redhat.io/ansible-automation-platform-22/ee-supported-rhel8@sha256:d134e198b179d1b21d3f067d745dd1a8e28167235c312cdc233860410ea3ec3e name: worker args: - ansible-runner - worker - '--private-data-dir=/runner' resources: requests: cpu: 250m memory: 100Mi
Extra settings
With extra_settings
, you can pass many custom settings by using the awx-operator. The parameter extra_settings
is appended to /etc/tower/settings.py
and can be an alternative to the extra_volumes
parameter.
Name | Description | Default |
---|---|---|
| Extra settings | ‘’ |
Example configuration of extra_settings
parameter
spec: extra_settings: - setting: MAX_PAGE_SIZE value: "500" - setting: AUTH_LDAP_BIND_DN value: "cn=admin,dc=example,dc=com" - setting: SYSTEM_TASK_ABS_MEM value: "500"
3.3. Custom pod timeouts
A container group job in automation controller transitions to the running
state just before you submit the pod to the Kubernetes API. Automation controller then expects the pod to enter the Running
state before AWX_CONTAINER_GROUP_POD_PENDING_TIMEOUT
seconds has elapsed. You can set AWX_CONTAINER_GROUP_POD_PENDING_TIMEOUT
to a higher value if you want automation controller to wait for longer before canceling jobs that fail to enter the Running
state. AWX_CONTAINER_GROUP_POD_PENDING_TIMEOUT
is how long automation controller waits from creation of a pod until the Ansible work begins in the pod. You can also extend the time if the pod cannot be scheduled because of resource constraints. You can do this using extra_settings
on the automation controller specification. The default value is two hours.
This is used if you are consistently launching many more jobs than Kubernetes can schedule, and jobs are spending periods longer than AWX_CONTAINER_GROUP_POD_PENDING_TIMEOUT
in pending.
Jobs are not launched until control capacity is available. If many more jobs are being launched than the container group has capacity to run, consider scaling up your Kubernetes worker nodes.
3.4. Jobs scheduled on the worker nodes
Both automation controller and Kubernetes play a role in scheduling a job.
When a job is launched, its dependencies are fulfilled, meaning any project updates or inventory updates are launched by automation controller as required by the job template, project, and inventory settings.
If the job is not blocked by other business logic in automation controller and there is control capacity in the control plane to start the job, the job is submitted to the dispatcher. The default settings of the "cost" to control a job is 1 capacity. So, a control pod with 100 capacity is able to control up to 100 jobs at a time. Given control capacity, the job transitions from pending to waiting.
The dispatcher, which is a background process in the control plan pod, starts a worker process to run the job. This communicates with the Kubernetes API using a service account associated with the container group and uses the pod specification as defined on the Container Group in automation controller to provision the pod. The job status in automation controller is shown as running.
Kubernetes now schedules the pod. A pod can remain in the pending state for AWX_CONTAINER_GROUP_POD_PENDING_TIMEOUT
. If the pod is denied through a ResourceQuota
, the job starts over at pending. You can configure a resource quota on a namespace to limit how many resources may be consumed by pods in the namespace. For further information about ResourceQuotas, see Resource Quotas.