Rechercher

Ce contenu n'est pas disponible dans la langue sélectionnée.

Appendix A. Troubleshooting containerized Ansible Automation Platform

download PDF

Use this information to troubleshoot your containerized Ansible Automation Platform installation.

A.1. Troubleshooting containerized Ansible Automation Platform installation

The installation takes a long time, or has errors, what should I check?

  1. Ensure your system meets the minimum requirements as outlined in the installation guide. Items such as improper storage choices and high latency when distributing across many hosts will all have a significant impact.
  2. Check the installation log file located by default at ./aap_install.log unless otherwise changed within the local installer ansible.cfg.
  3. Enable task profiling callbacks on an ad hoc basis to give an overview of where the installation program spends the most time. To do this, use the local ansible.cfg file. Add a callback line such as this under the [defaults] section:
$ cat ansible.cfg
[defaults]
callbacks_enabled = ansible.posix.profile_tasks

Automation controller returns an error of 413

This error is due to manifest.zip license files that are larger than the nginx_client_max_body_size setting. If this error occurs, you will need to change the installation inventory file to include the following variables:

nginx_disable_hsts: false
nginx_http_port: 8081
nginx_https_port: 8444
nginx_client_max_body_size: 20m
nginx_user_headers: []

The current default setting of 20m should be enough to avoid this issue.

The installation failed with a “502 Bad Gateway” when going to the controller UI.

This error can occur and manifest itself in the installation application output as:

TASK [ansible.containerized_installer.automationcontroller : Wait for the Controller API to te ready] ******************************************************
fatal: [daap1.lan]: FAILED! => {"changed": false, "connection": "close", "content_length": "150", "content_type": "text/html", "date": "Fri, 29 Sep 2023 09:42:32 GMT", "elapsed": 0, "msg": "Status code was 502 and not [200]: HTTP Error 502: Bad Gateway", "redirected": false, "server": "nginx", "status": 502, "url": "https://daap1.lan:443/api/v2/ping/"}
  • Check if you have an automation-controller-web container running and a systemd service.
Note

This is used at the regular unprivileged user not system wide level. If you have used su to switch to the user running the containers, you must set your XDG_RUNTIME_DIR environment variable to the correct value to be able to interact with the user systemctl units.

export XDG_RUNTIME_DIR="/run/user/$UID"
podman ps | grep web
systemctl --user | grep web

No output indicates a problem.

  1. Try restarting the automation-controller-web service:

    systemctl start automation-controller-web.service --user
    systemctl --user | grep web
    systemctl status automation-controller-web.service --user
    Sep 29 10:55:16 daap1.lan automation-controller-web[29875]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
    Sep 29 10:55:16 daap1.lan automation-controller-web[29875]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)

    The output indicates that the port is already, or still, in use by another service. In this case nginx.

  2. Run:

    sudo pkill nginx
  3. Restart and status check the web service again.

Normal service output should look similar to the following, and should still be running:

Sep 29 10:59:26 daap1.lan automation-controller-web[30274]: WSGI app 0 (mountpoint='/') ready in 3 seconds on interpreter 0x1a458c10 pid: 17 (default app)
Sep 29 10:59:26 daap1.lan automation-controller-web[30274]: WSGI app 0 (mountpoint='/') ready in 3 seconds on interpreter 0x1a458c10 pid: 20 (default app)
Sep 29 10:59:27 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:27,043 INFO     [-] daphne.cli Starting server at tcp:port=8051:interface=127.0.>
Sep 29 10:59:27 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:27,043 INFO     Starting server at tcp:port=8051:interface=127.0.0.1
Sep 29 10:59:27 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:27,048 INFO     [-] daphne.server HTTP/2 support not enabled (install the http2 >
Sep 29 10:59:27 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:27,048 INFO     HTTP/2 support not enabled (install the http2 and tls Twisted ex>
Sep 29 10:59:27 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:27,049 INFO     [-] daphne.server Configuring endpoint tcp:port=8051:interface=1>
Sep 29 10:59:27 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:27,049 INFO     Configuring endpoint tcp:port=8051:interface=127.0.0.1
Sep 29 10:59:27 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:27,051 INFO     [-] daphne.server Listening on TCP address 127.0.0.1:8051
Sep 29 10:59:27 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:27,051 INFO     Listening on TCP address 127.0.0.1:8051
Sep 29 10:59:54 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:54,139 INFO success: nginx entered RUNNING state, process has stayed up for > th>
Sep 29 10:59:54 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:54,139 INFO success: nginx entered RUNNING state, process has stayed up for > th>
Sep 29 10:59:54 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:54,139 INFO success: uwsgi entered RUNNING state, process has stayed up for > th>
Sep 29 10:59:54 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:54,139 INFO success: uwsgi entered RUNNING state, process has stayed up for > th>
Sep 29 10:59:54 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:54,139 INFO success: daphne entered RUNNING state, process has stayed up for > t>
Sep 29 10:59:54 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:54,139 INFO success: daphne entered RUNNING state, process has stayed up for > t>
Sep 29 10:59:54 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:54,139 INFO success: ws-heartbeat entered RUNNING state, process has stayed up f>
Sep 29 10:59:54 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:54,139 INFO success: ws-heartbeat entered RUNNING state, process has stayed up f>
Sep 29 10:59:54 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:54,139 INFO success: cache-clear entered RUNNING state, process has stayed up fo>
Sep 29 10:59:54 daap1.lan automation-controller-web[30274]: 2023-09-29 09:59:54,139 INFO success: cache-clear entered RUNNING state, process has stayed up

You can run the installation program again to ensure everything installs as expected.

When attempting to install containerized Ansible Automation Platform in Amazon Web Services you receive output that there is no space left on device

TASK [ansible.containerized_installer.automationcontroller : Create the receptor container] ***************************************************
fatal: [ec2-13-48-25-168.eu-north-1.compute.amazonaws.com]: FAILED! => {"changed": false, "msg": "Can't create container receptor", "stderr": "Error: creating container storage: creating an ID-mapped copy of layer \"98955f43cc908bd50ff43585fec2c7dd9445eaf05eecd1e3144f93ffc00ed4ba\": error during chown: storage-chown-by-maps: lchown usr/local/lib/python3.9/site-packages/azure/mgmt/network/v2019_11_01/operations/__pycache__/_available_service_aliases_operations.cpython-39.pyc: no space left on device: exit status 1\n", "stderr_lines": ["Error: creating container storage: creating an ID-mapped copy of layer \"98955f43cc908bd50ff43585fec2c7dd9445eaf05eecd1e3144f93ffc00ed4ba\": error during chown: storage-chown-by-maps: lchown usr/local/lib/python3.9/site-packages/azure/mgmt/network/v2019_11_01/operations/__pycache__/_available_service_aliases_operations.cpython-39.pyc: no space left on device: exit status 1"], "stdout": "", "stdout_lines": []}

If you are installing a /home filesystem into a default Amazon Web Services marketplace RHEL instance, it might be too small since /home is part of the root / filesystem. You will need to make more space available. The documentation specifies a minimum of 40GB for a single-node deployment of containerized Ansible Automation Platform.

A.2. Troubleshooting containerized Ansible Automation Platform configuration

Sometimes the post install for seeding my Ansible Automation Platform content errors out. This could manifest itself as output similar to this:

TASK [infra.controller_configuration.projects : Configure Controller Projects | Wait for finish the projects creation] ***************************************
Friday 29 September 2023  11:02:32 +0100 (0:00:00.443)       0:00:53.521 ******
FAILED - RETRYING: [daap1.lan]: Configure Controller Projects | Wait for finish the projects creation (1 retries left).
failed: [daap1.lan] (item={'failed': 0, 'started': 1, 'finished': 0, 'ansible_job_id': '536962174348.33944', 'results_file': '/home/aap/.ansible_async/536962174348.33944', 'changed': False, '__controller_project_item': {'name': 'AAP Config-As-Code Examples', 'organization': 'Default', 'scm_branch': 'main', 'scm_clean': 'no', 'scm_delete_on_update': 'no', 'scm_type': 'git', 'scm_update_on_launch': 'no', 'scm_url': 'https://github.com/user/repo.git'}, 'ansible_loop_var': '__controller_project_item'}) => {"__projects_job_async_results_item": {"__controller_project_item": {"name": "AAP Config-As-Code Examples", "organization": "Default", "scm_branch": "main", "scm_clean": "no", "scm_delete_on_update": "no", "scm_type": "git", "scm_update_on_launch": "no", "scm_url": "https://github.com/user/repo.git"}, "ansible_job_id": "536962174348.33944", "ansible_loop_var": "__controller_project_item", "changed": false, "failed": 0, "finished": 0, "results_file": "/home/aap/.ansible_async/536962174348.33944", "started": 1}, "ansible_job_id": "536962174348.33944", "ansible_loop_var": "__projects_job_async_results_item", "attempts": 30, "changed": false, "finished": 0, "results_file": "/home/aap/.ansible_async/536962174348.33944", "started": 1, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

The infra.controller_configuration.dispatch role uses an asynchronous loop with 30 retries to apply each configuration type, and the default delay between retries is 1 second. If the configuration is large, this might not be enough time to apply everything before the last retry occurs.

Increase the retry delay by setting the controller_configuration_async_delay variable to something other than 1 second. For example, setting it to 2 seconds doubles the retry time. The place to do this would be in the repository where the controller configuration is defined. It could also be added to the [all:vars] section of the installation program inventory file.

A few instances have shown that no additional modification is required, and re-running the installation program again worked.

A.3. Containerized Ansible Automation Platform reference

Can you provide details of the architecture for the Ansible Automation Platform containerized design?

We use as much of the underlying native RHEL technology as possible. For the container runtime and management of services we use Podman. Many Podman services and commands are used to show and investigate the solution.

For instance, use podman ps, and podman images to see some of the foundational and running pieces:

[aap@daap1 aap]$ podman ps
CONTAINER ID  IMAGE                                                                        COMMAND               CREATED         STATUS         PORTS       NAMES
88ed40495117  registry.redhat.io/rhel8/postgresql-13:latest                                run-postgresql        48 minutes ago  Up 47 minutes              postgresql
8f55ba612f04  registry.redhat.io/rhel8/redis-6:latest                                      run-redis             47 minutes ago  Up 47 minutes              redis
56c40445c590  registry.redhat.io/ansible-automation-platform-24/ee-supported-rhel8:latest  /usr/bin/receptor...  47 minutes ago  Up 47 minutes              receptor
f346f05d56ee  registry.redhat.io/ansible-automation-platform-24/controller-rhel8:latest    /usr/bin/launch_a...  47 minutes ago  Up 45 minutes              automation-controller-rsyslog
26e3221963e3  registry.redhat.io/ansible-automation-platform-24/controller-rhel8:latest    /usr/bin/launch_a...  46 minutes ago  Up 45 minutes              automation-controller-task
c7ac92a1e8a1  registry.redhat.io/ansible-automation-platform-24/controller-rhel8:latest    /usr/bin/launch_a...  46 minutes ago  Up 28 minutes              automation-controller-web

[aap@daap1 aap]$ podman images
REPOSITORY                                                            TAG         IMAGE ID      CREATED      SIZE
registry.redhat.io/ansible-automation-platform-24/ee-supported-rhel8  latest      b497bdbee59e  10 days ago  3.16 GB
registry.redhat.io/ansible-automation-platform-24/controller-rhel8    latest      ed8ebb1c1baa  10 days ago  1.48 GB
registry.redhat.io/rhel8/redis-6                                      latest      78905519bb05  2 weeks ago  357 MB
registry.redhat.io/rhel8/postgresql-13                                latest      9b65bc3d0413  2 weeks ago  765 MB
[aap@daap1 aap]$

Containerized Ansible Automation Platform runs as rootless containers for maximum out-of-the-box security. This means you can install containerized Ansible Automation Platform by using any local unprivileged user account. Privilege escalation is only needed for certain root level tasks, and by default is not needed to use root directly.

Once installed, you will notice certain items have populate on the filesystem where the installation program is run (the underlying RHEL host).

[aap@daap1 aap]$ tree -L 1
.
├── aap_install.log
├── ansible.cfg
├── collections
├── galaxy.yml
├── inventory
├── LICENSE
├── meta
├── playbooks
├── plugins
├── README.md
├── requirements.yml
├── roles

Other containerized services that make use of things such as Podman volumes, reside under the installation root directory used. Here are some examples for further reference:

The containers directory contains some of the Podman specifics used and installed for the execution plane:

containers/
├── podman
├── storage
│   ├── defaultNetworkBackend
│   ├── libpod
│   ├── networks
│   ├── overlay
│   ├── overlay-containers
│   ├── overlay-images
│   ├── overlay-layers
│   ├── storage.lock
│   └── userns.lock
└── storage.conf

The controller directory has some of the installed configuration and runtime data points:

controller/
├── data
│   ├── job_execution
│   ├── projects
│   └── rsyslog
├── etc
│   ├── conf.d
│   ├── launch_awx_task.sh
│   ├── settings.py
│   ├── tower.cert
│   └── tower.key
├── nginx
│   └── etc
├── rsyslog
│   └── run
└── supervisor
    └── run

The receptor directory has the automation mesh configuration:

receptor/
├── etc
│   └── receptor.conf
└── run
    ├── receptor.sock
    └── receptor.sock.lock

After installation, you will also find other pieces in the local users home directory such as the .cache directory:

.cache/
├── containers
│   └── short-name-aliases.conf.lock
└── rhsm
    └── rhsm.log

As we run by default in the most secure manner, such as rootless Podman, we can also use other services such as running systemd as non-privileged users. Under systemd you can see some of the component service controls available:

The .config directory:

.config/
├── cni
│   └── net.d
│       └── cni.lock
├── containers
│   ├── auth.json
│   └── containers.conf
└── systemd
    └── user
        ├── automation-controller-rsyslog.service
        ├── automation-controller-task.service
        ├── automation-controller-web.service
        ├── default.target.wants
        ├── podman.service.d
        ├── postgresql.service
        ├── receptor.service
        ├── redis.service
        └── sockets.target.wants

This is specific to Podman and conforms to the Open Container Initiative (OCI) specifications. Whereas Podman run as the root user would use /var/lib/containers by default, for standard users the hierarchy under $HOME/.local is used.

The .local directory:

.local/
└── share
    └── containers
        ├── cache
        ├── podman
        └── storage

As an example `.local/storage/volumes` contains what the output from `podman volume ls` provides:

[aap@daap1 containers]$ podman volume ls
DRIVER      VOLUME NAME
local       d73d3fe63a957bee04b4853fd38c39bf37c321d14fdab9ee3c9df03645135788
local       postgresql
local       redis_data
local       redis_etc
local       redis_run

We isolate the execution plane from the control plane main services (PostgreSQL, Redis, automation controller, receptor, automation hub and Event-Driven Ansible).

Control plane services run with the standard Podman configuration (~/.local/share/containers/storage).

Execution plane services use a dedicated configuration or storage (~/aap/containers/storage) to avoid execution plane containers to be able to interact with the control plane.

How can I see host resource utilization statistics?

  • Run:
$ podman container stats -a
podman container stats -a
ID            NAME                           CPU %       MEM USAGE / LIMIT  MEM %       NET IO      BLOCK IO    PIDS        CPU TIME    AVG CPU %
0d5d8eb93c18  automation-controller-web      0.23%       959.1MB / 3.761GB  25.50%      0B / 0B     0B / 0B     16          20.885142s  1.19%
3429d559836d  automation-controller-rsyslog  0.07%       144.5MB / 3.761GB  3.84%       0B / 0B     0B / 0B     6           4.099565s   0.23%
448d0bae0942  automation-controller-task     1.51%       633.1MB / 3.761GB  16.83%      0B / 0B     0B / 0B     33          34.285272s  1.93%
7f140e65b57e  receptor                       0.01%       5.923MB / 3.761GB  0.16%       0B / 0B     0B / 0B     7           1.010613s   0.06%
c1458367ca9c  redis                          0.48%       10.52MB / 3.761GB  0.28%       0B / 0B     0B / 0B     5           9.074042s   0.47%
ef712cc2dc89  postgresql                     0.09%       21.88MB / 3.761GB  0.58%       0B / 0B     0B / 0B     21          15.571059s  0.80%

The previous is an example of a Dell sold and offered containerized Ansible Automation Platform solution (DAAP) install and utilizes ~1.8Gb RAM.

How much storage is used and where?

As we run rootless Podman the container volume storage is under the local user at $HOME/.local/share/containers/storage/volumes.

  1. To view the details of each volume run:

    $ podman volume ls
  2. Then run:

    $ podman volume inspect <volume_name>

Here is an example:

$ podman volume inspect postgresql
[
    {
        "Name": "postgresql",
        "Driver": "local",
        "Mountpoint": "/home/aap/.local/share/containers/storage/volumes/postgresql/_data",
        "CreatedAt": "2024-01-08T23:39:24.983964686Z",
        "Labels": {},
        "Scope": "local",
        "Options": {},
        "MountCount": 0,
        "NeedsCopyUp": true
    }
]

Several files created by the installation program are located in $HOME/aap/ and bind-mounted into various running containers.

  1. To view the mounts associated with a container run:

    $ podman ps --format "{{.ID}}\t{{.Command}}\t{{.Names}}"
    Example:
    $ podman ps --format "{{.ID}}\t{{.Command}}\t{{.Names}}"
    89e779b81b83	run-postgresql	postgresql
    4c33cc77ef7d	run-redis	redis
    3d8a028d892d	/usr/bin/receptor...	receptor
    09821701645c	/usr/bin/launch_a...	automation-controller-rsyslog
    a2ddb5cac71b	/usr/bin/launch_a...	automation-controller-task
    fa0029a3b003	/usr/bin/launch_a...	automation-controller-web
    20f192534691	gunicorn --bind 1...	automation-eda-api
    f49804c7e6cb	daphne -b 127.0.0...	automation-eda-daphne
    d340b9c1cb74	/bin/sh -c nginx ...	automation-eda-web
    111f47de5205	aap-eda-manage rq...	automation-eda-worker-1
    171fcb1785af	aap-eda-manage rq...	automation-eda-worker-2
    049d10555b51	aap-eda-manage rq...	automation-eda-activation-worker-1
    7a78a41a8425	aap-eda-manage rq...	automation-eda-activation-worker-2
    da9afa8ef5e2	aap-eda-manage sc...	automation-eda-scheduler
    8a2958be9baf	gunicorn --name p...	automation-hub-api
    0a8b57581749	gunicorn --name p...	automation-hub-content
    68005b987498	nginx -g daemon o...	automation-hub-web
    cb07af77f89f	pulpcore-worker	automation-hub-worker-1
    a3ba05136446	pulpcore-worker	automation-hub-worker-2
  2. Then run:

    $ podman inspect <container_name> | jq -r .[].Mounts[].Source
    Example:
    /home/aap/.local/share/containers/storage/volumes/receptor_run/_data
    /home/aap/.local/share/containers/storage/volumes/redis_run/_data
    /home/aap/aap/controller/data/rsyslog
    /home/aap/aap/controller/etc/tower.key
    /home/aap/aap/controller/etc/conf.d/callback_receiver_workers.py
    /home/aap/aap/controller/data/job_execution
    /home/aap/aap/controller/nginx/etc/controller.conf
    /home/aap/aap/controller/etc/conf.d/subscription_usage_model.py
    /home/aap/aap/controller/etc/conf.d/cluster_host_id.py
    /home/aap/aap/controller/etc/conf.d/insights.py
    /home/aap/aap/controller/rsyslog/run
    /home/aap/aap/controller/data/projects
    /home/aap/aap/controller/etc/settings.py
    /home/aap/aap/receptor/etc/receptor.conf
    /home/aap/aap/controller/etc/conf.d/execution_environments.py
    /home/aap/aap/tls/extracted
    /home/aap/aap/controller/supervisor/run
    /home/aap/aap/controller/etc/uwsgi.ini
    /home/aap/aap/controller/etc/conf.d/container_groups.py
    /home/aap/aap/controller/etc/launch_awx_task.sh
    /home/aap/aap/controller/etc/tower.cert
  3. If the jq RPM is not installed, install with:

    $ sudo dnf -y install jq
Red Hat logoGithubRedditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez leBlog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

© 2024 Red Hat, Inc.