Chapter 9. Monitoring and logging
You can send metrics to the Google Cloud Platform monitoring system to be visualized in the Google Cloud Platform UI. Ansible Automation Platform from GCP Marketplace metrics and logging are disabled by default as there is a cost to send these metrics to GCP. Refer to Cloud Monitoring and Cloud Logging respectively for more information.
You can set up GCP monitoring and logging either:
- At deployment time, read Setting up monitoring and logging at deployment time, or
- After the deployment
9.1. Setting up monitoring and logging after deployment
You can start or stop the logging and monitoring after the deployment by using the gcp_setup_logging_monitoring
playbook available from registry.redhat.com.
9.1.1. Required permissions
You must have the following GCP IAM permissions to set up logging and monitoring:
required-roles: Service Account User Compute Instance Admin (v1) required-permissions: cloudsql.instances.connect cloudsql.instances.get cloudsql.instances.login cloudsql.users.update compute.addresses.get compute.addresses.list compute.instances.delete compute.instances.get compute.instances.list compute.instances.setLabels compute.zoneOperations.get deploymentmanager.deployments.list deploymentmanager.manifests.get deploymentmanager.manifests.list file.instances.get file.instances.list file.instances.update file.operations.get iap.tunnelInstances.accessViaIAP logging.logEntries.create monitoring.timeSeries.create resourcemanager.projects.get runtimeconfig.variables.create runtimeconfig.variables.get runtimeconfig.variables.list runtimeconfig.variables.update secretmanager.secrets.create secretmanager.secrets.delete secretmanager.secrets.get secretmanager.versions.add secretmanager.versions.get secretmanager.versions.list servicenetworking.operations.get servicenetworking.services.addPeering serviceusage.services.list
9.1.2. Pulling the ansible-on-clouds-ops container image
Pull the Docker image for the Ansible on Clouds operational container which aligns with the version of your foundation deployment. If you are unsure of the version you have deployed, see Command Generator and playbook gcp_get_aoc_version for more information on finding the current version of Ansible on Clouds deployment.
Before pulling the docker image, ensure you are logged in to registry.redhat.io using docker. Use the following command to login to registry.redhat.io.
$ docker login registry.redhat.io
For more information about registry login, see Registry Authentication
For example, if your foundation deployment version is 2.4.20240215-00, you must pull the operational image with tag 2.4.20240215.
Use the following commands:
$ export IMAGE=registry.redhat.io/ansible-on-clouds/ansible-on-clouds-ops-rhel9:2.4.20240215 $ docker pull $IMAGE --platform=linux/amd64
If your foundation deployment version is not 2.4.20240215-00, refer to the tables on the Released versions page for a matching deployment version, in the Ansible on Clouds version column. Find the corresponding operational image to use, in the Ansible-on-clouds-ops container image column, for the IMAGE environment variable.
9.1.3. Generating data files by running the ansible-on-clouds-ops container
The following commands generate the required data file. These commands create a directory, and an empty data template that, when populated, is used to generate the playbook.
Procedure
Create a folder to hold the configuration files.
$ mkdir command_generator_data
Populate the
command_generator_data
folder with the configuration file template.NoteOn Linux, any file or directory created by the command generator is owned by
root:root
by default. To change the ownership of the files and directories, you can run thesudo chmod
command after the files are created. For more information, read Command generator - Linux files owned by root.$ docker run --rm -v $(pwd)/command_generator_data:/data $IMAGE \ command_generator_vars gcp_setup_logging_monitoring \ --output-data-file /data/logging-monitoring.yml
When you have run these commands, a
command_generator_data/logging-monitoring.yml
template file is created.NoteIn the following example file,
ansible_config_path
is optional.This template file resembles the following:-
gcp_setup_logging_monitoring: ansible_config_path: cloud_credentials_path: deployment_name: extra_vars: components: default_collector_interval: logging_enabled: monitoring_enabled:
9.1.4. Updating the data file
If you do not require a parameter, remove that parameter from the configuration file.
Procedure
-
Edit the
command_generator_data/logging-monitoring.yml
file and set the following parameters: -
ansible_config_path
is used by default as the standard configuration for theansible-on-cloud offering
but if you have extra requirements in your environment you can specify your own. -
cloud_credentials_path
is the absolute path toward your credentials. This must be an absolute path. -
deployment_name
is the name of the deployment. -
components
(Optional) the type of component on which you want to carry out the setup. The default is [ “controller”, “hub” ] which means that the logging monitoring will be enabled on both automation controller and automation hub. -
monitoring_enabled
(Optional) is set totrue
to enable the monitoring,false
otherwise. Default =false
. -
logging_enabled
(Optional) is set totrue
to enable the logging,false
otherwise. Default =false
. default_collector_interval
(Optional) is the frequency at which the monitoring data must be sent to Google Cloud. Default = 59s.NoteThe Google cost of this service depends on that periodicity and so the higher the value of the collector interval, the less it will cost.
Do not set values less than 59 seconds.
NoteIf monitoring and logging is disabled, the value of 'default_collector_interval' is automatically set to
0
.
After populating the data file, it should resemble the following.
The following values are provided as examples:
The optional parameters described in the section are ommitted in the data file example below. The playbook uses the default value for any optional parameter that is ommitted from the data file. If you want to override a default value for an optional parameter, then it must be included in the data file and assigned a value.
gcp_setup_logging_monitoring: cloud_credentials_path: ~/secrets/GCP-secrets.json deployment_name: AnsibleAutomationPlatform extra_vars:
9.1.5. Generating the playbook
To generate the playbook, run the command generator to generate the CLI command.
docker run --rm -v $(pwd)/command_generator_data:/data $IMAGE command_generator gcp_setup_logging_monitoring \ --data-file /data/logging-monitoring.yml
Provides the following command:
docker run --rm --env PLATFORM=GCP -v </path/to/gcp/service-account.json>:/home/runner/.gcp/credentials:ro \ --env ANSIBLE_CONFIG=../gcp-ansible.cfg --env DEPLOYMENT_NAME=<deployment_name> --env GENERATE_INVENTORY=true \ $IMAGE redhat.ansible_on_clouds.gcp_setup_logging_monitoring -e 'gcp_deployment_name=<deployment_name> \ gcp_service_account_credentials_json_path=/home/runner/.gcp/credentials monitoring_enabled=<monitoring_enabled> \ logging_enabled=<logging_enabled> default_collector_interval=<interval>'
Run the supplied command to run the playbook.
$ docker run --rm --env PLATFORM=GCP -v /path/to/credentials:/home/runner/.gcp/credentials:ro \ --env ANSIBLE_CONFIG=../gcp-ansible.cfg --env DEPLOYMENT_NAME=mu-deployment \ --env GENERATE_INVENTORY=true $IMAGE redhat.ansible_on_clouds.gcp_setup_logging_monitoring \ -e 'gcp_deployment_name=mu-deployment \ gcp_service_account_credentials_json_path=/home/runner/.gcp/credentials components=["hubs","controllers"]\ monitoring_enabled=True logging_enabled=True default_collector_interval=60s'
The process may take some time, and provides output similar to the following:
TASK [redhat.ansible_on_clouds.setup_logging_monitoring : Update runtime variable logging_enabled] *** changed: [<user_name> -> localhost] TASK [redhat.ansible_on_clouds.setup_logging_monitoring : Update runtime variable monitoring_enabled] *** changed: [<user_name> -> localhost] PLAY RECAP ********************************************************************* <user_name> : ok=20 changed=6 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
9.2. Customizing monitoring and logging
Metrics are provided by Ansible, Podman and Google Ops Agent. The Google Ops Agent and Podman are installed on automation controller and automation hub VM instances, however Ansible metrics are only installed on automation hub instances.
A configurable process (collector) runs on each automation controller VM instance and automation hub VM instance to export the collected Ansible and Podman metrics to Google Cloud Platform Monitoring. As the Google Ops Agent is part of the Google Cloud solution, it has its own configuration file.
The Google Ops Agent is also responsible for the logging configuration.
The service APIs monitoring.googleapis.com
and logging.googleapis.com
must be respectively enabled for the monitoring and logging capabilities.
Configuration
Configuration files are located on a disk shared by each automation controller and automation hub. Modify the file /aap/bootstrap/config_file_templates/<controller|hub>/monitoring.yml
to configure all exporters and agents.
9.2.1. Ansible and podman configuration
The file /aap/bootstrap/config_file_templates/<controller|hub>/monitoring.yaml
on automation controller or automation hub contains the configuration for collecting and sending Ansible and podman metrics to GCP.
The default configuration for automation controller looks like this:
# This value will be set at deployment time. # Set to zero if monitoringEnabled is false otherwise 59s # The collection interval for each collector will be the minimum # between the defaultCollectorInterval and all send Interval # of a given collector # NB: The awx exported should not run on controllers as # it duplicates the number of records sent to GCP Monitoring defaultCollectorInterval: $DEFAULT_COLLECTOR_INTERVAL collectors: - name: podman endpoint: http://localhost:9882/podman/metrics enabled: true # list of metrics to exclude # excludedMetrics: # - podman_container_created_seconds metrics: - name: podman_container_exit_code # interval on which the metric must be pushed to gcp sendInterval: 59s
The default configuration for automation hub looks like:
# This value will be set at deployment time. # Set to zero if monitoringEnabled is false otherwise 59s # The collection interval for each collector will be the minimum # between the defaultCollectorInterval and all sendInterval # of a given collector # NB: The awx exporter should not run on controllers as # it duplicates the number of records sent to GCP Monitoring defaultCollectorInterval: 59s collectors: - name: awx userName: admin endpoint: http://<Controller_LB_IP>/api/v2/metrics/ enabled: true metrics: - name: awx_inventories_total # interval on which the metric must be pushed to gcp sendInterval: 59s - name: podman endpoint: http://localhost:9882/podman/metrics enabled: true # list of metrics to exclude # excludedMetrics: # - podman_container_created_seconds metrics: - name: podman_container_exit_code # interval on which the metric must be pushed to gcp sendInterval: 59s
where collectors
is a configuration array with one item per collector, that is, awx
and podman.
The awx collector requires authentication and so userName
must be set to admin
. The password is retrieved from the secret-manager.
The endpoint should not be changed.
defaultCollectorInterval
specifies the default interval at which the exporter collects the information from the metric end-point and sends it to Google Cloud Platform Monitoring.
Setting this value to 0
or omitting this attribute disables all collectors.
Each collector can be enabled or disabled separately by setting enabled
to true
or false
.
A collector returns all available metrics grouped by families, but you can exclude the families that should not be sent to Google Cloud Platform Monitoring by adding their name in the array excludedMetrics
.
For all other family metrics, you can specify the interval at which you want to collect and send them to the Google Cloud Platform Monitoring. The collector interval is the minimum between all family metrics interval and the defaultCollectorInterval
. This to ensure that a collection is made for each set of metrics sent to Google Cloud Platform Monitoring.
9.2.2. Google cloud ops agent configuration
The configuration file details can be found here.
The configuration file is located in /etc/google-cloud-ops-agent/config.yml
.
This is a symbolic link to the shared disk /aap/bootstrap/config_file_templates/controller/gcp-ops-agent-config.yml
or /aap/bootstrap/config_file_templates/hub/gcp-ops-agent-config.yml
depending on the component type.
The configuration file contains a number of receivers specifying what should be collected by the ops agent.
Your selection of Connect Logging and Connect Metrics during deployment determines which pipelines are included in the file and therefore which logs and metrics are collected and sent to GCP.
If you need to add more pipelines post-deployment, you can insert them in /aap/bootstrap/config_file_templates/hub|controller/gcp-ops-agent-config.yml
.
A crontab job restarts the agent if gcp-ops-agent-config.yml
changed in the last 10 minutes. The agent rereads its configuration after a restart.