Chapter 14. Prometheus and Grafana metrics under Red Hat Quay
Red Hat Quay exports a Prometheus- and Grafana-compatible endpoint on each instance to allow for easy monitoring and alerting.
14.1. Exposing the Prometheus endpoint
14.1.1. Standalone Red Hat Quay
When using podman run
to start the Quay
container, expose the metrics port 9091
:
$ sudo podman run -d --rm -p 80:8080 -p 443:8443 -p 9091:9091\ --name=quay \ -v $QUAY/config:/conf/stack:Z \ -v $QUAY/storage:/datastorage:Z \ registry.redhat.io/quay/quay-rhel8:v3.13
The metrics will now be available:
$ curl quay.example.com:9091/metrics
See Monitoring Quay with Prometheus and Grafana for details on configuring Prometheus and Grafana to monitor Quay repository counts.
14.1.2. Red Hat Quay Operator
Determine the cluster IP for the quay-metrics
service:
$ oc get services -n quay-enterprise NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE example-registry-clair-app ClusterIP 172.30.61.161 <none> 80/TCP,8089/TCP 18h example-registry-clair-postgres ClusterIP 172.30.122.136 <none> 5432/TCP 18h example-registry-quay-app ClusterIP 172.30.72.79 <none> 443/TCP,80/TCP,8081/TCP,55443/TCP 18h example-registry-quay-config-editor ClusterIP 172.30.185.61 <none> 80/TCP 18h example-registry-quay-database ClusterIP 172.30.114.192 <none> 5432/TCP 18h example-registry-quay-metrics ClusterIP 172.30.37.76 <none> 9091/TCP 18h example-registry-quay-redis ClusterIP 172.30.157.248 <none> 6379/TCP 18h
Connect to your cluster and access the metrics using the cluster IP and port for the quay-metrics
service:
$ oc debug node/master-0 sh-4.4# curl 172.30.37.76:9091/metrics # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 4.0447e-05 go_gc_duration_seconds{quantile="0.25"} 6.2203e-05 ...
14.1.3. Setting up Prometheus to consume metrics
Prometheus needs a way to access all Red Hat Quay instances running in a cluster. In the typical setup, this is done by listing all the Red Hat Quay instances in a single named DNS entry, which is then given to Prometheus.
14.1.4. DNS configuration under Kubernetes
A simple Kubernetes service can be configured to provide the DNS entry for Prometheus.
14.1.5. DNS configuration for a manual cluster
SkyDNS is a simple solution for managing this DNS record when not using Kubernetes. SkyDNS can run on an etcd cluster. Entries for each Red Hat Quay instance in the cluster can be added and removed in the etcd store. SkyDNS will regularly read them from there and update the list of Quay instances in the DNS record accordingly.
14.2. Introduction to metrics
Red Hat Quay provides metrics to help monitor the registry, including metrics for general registry usage, uploads, downloads, garbage collection, and authentication.
14.2.1. General registry statistics
General registry statistics can indicate how large the registry has grown.
Metric name | Description |
---|---|
quay_user_rows | Number of users in the database |
quay_robot_rows | Number of robot accounts in the database |
quay_org_rows | Number of organizations in the database |
quay_repository_rows | Number of repositories in the database |
quay_security_scanning_unscanned_images_remaining_total | Number of images that are not scanned by the latest security scanner |
Sample metrics output
# HELP quay_user_rows number of users in the database # TYPE quay_user_rows gauge quay_user_rows{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="65",process_name="globalpromstats.py"} 3 # HELP quay_robot_rows number of robot accounts in the database # TYPE quay_robot_rows gauge quay_robot_rows{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="65",process_name="globalpromstats.py"} 2 # HELP quay_org_rows number of organizations in the database # TYPE quay_org_rows gauge quay_org_rows{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="65",process_name="globalpromstats.py"} 2 # HELP quay_repository_rows number of repositories in the database # TYPE quay_repository_rows gauge quay_repository_rows{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="65",process_name="globalpromstats.py"} 4 # HELP quay_security_scanning_unscanned_images_remaining number of images that are not scanned by the latest security scanner # TYPE quay_security_scanning_unscanned_images_remaining gauge quay_security_scanning_unscanned_images_remaining{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 5
14.2.2. Queue items
The queue items metrics provide information on the multiple queues used by Quay for managing work.
Metric name | Description |
---|---|
quay_queue_items_available | Number of items in a specific queue |
quay_queue_items_locked | Number of items that are running |
quay_queue_items_available_unlocked | Number of items that are waiting to be processed |
Metric labels
queue_name: The name of the queue. One of:
- exportactionlogs: Queued requests to export action logs. These logs are then processed and put in storage. A link is then sent to the requester via email.
- namespacegc: Queued namespaces to be garbage collected
- notification: Queue for repository notifications to be sent out
- repositorygc: Queued repositories to be garbage collected
- secscanv4: Notification queue specific for Clair V4
- dockerfilebuild: Queue for Quay docker builds
- imagestoragereplication: Queued blob to be replicated across multiple storages
- chunk_cleanup: Queued blob segments that needs to be deleted. This is only used by some storage implementations, for example, Swift.
For example, the queue labelled repositorygc contains the repositories marked for deletion by the repository garbage collection worker. For metrics with a queue_name label of repositorygc:
- quay_queue_items_locked is the number of repositories currently being deleted.
- quay_queue_items_available_unlocked is the number of repositories waiting to get processed by the worker.
Sample metrics output
# HELP quay_queue_items_available number of queue items that have not expired # TYPE quay_queue_items_available gauge quay_queue_items_available{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="63",process_name="exportactionlogsworker.py",queue_name="exportactionlogs"} 0 ... # HELP quay_queue_items_available_unlocked number of queue items that have not expired and are not locked # TYPE quay_queue_items_available_unlocked gauge quay_queue_items_available_unlocked{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="63",process_name="exportactionlogsworker.py",queue_name="exportactionlogs"} 0 ... # HELP quay_queue_items_locked number of queue items that have been acquired # TYPE quay_queue_items_locked gauge quay_queue_items_locked{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="63",process_name="exportactionlogsworker.py",queue_name="exportactionlogs"} 0
14.2.3. Garbage collection metrics
These metrics show you how many resources have been removed from garbage collection (gc). They show many times the gc workers have run and how many namespaces, repositories, and blobs were removed.
Metric name | Description |
---|---|
quay_gc_iterations_total | Number of iterations by the GCWorker |
quay_gc_namespaces_purged_total | Number of namespaces purged by the NamespaceGCWorker |
quay_gc_repos_purged_total | Number of repositories purged by the RepositoryGCWorker or NamespaceGCWorker |
quay_gc_storage_blobs_deleted_total | Number of storage blobs deleted |
Sample metrics output
# TYPE quay_gc_iterations_created gauge quay_gc_iterations_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.6317823190189714e+09 ... # HELP quay_gc_iterations_total number of iterations by the GCWorker # TYPE quay_gc_iterations_total counter quay_gc_iterations_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0 ... # TYPE quay_gc_namespaces_purged_created gauge quay_gc_namespaces_purged_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.6317823190189433e+09 ... # HELP quay_gc_namespaces_purged_total number of namespaces purged by the NamespaceGCWorker # TYPE quay_gc_namespaces_purged_total counter quay_gc_namespaces_purged_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0 .... # TYPE quay_gc_repos_purged_created gauge quay_gc_repos_purged_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.631782319018925e+09 ... # HELP quay_gc_repos_purged_total number of repositories purged by the RepositoryGCWorker or NamespaceGCWorker # TYPE quay_gc_repos_purged_total counter quay_gc_repos_purged_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0 ... # TYPE quay_gc_storage_blobs_deleted_created gauge quay_gc_storage_blobs_deleted_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.6317823190189059e+09 ... # HELP quay_gc_storage_blobs_deleted_total number of storage blobs deleted # TYPE quay_gc_storage_blobs_deleted_total counter quay_gc_storage_blobs_deleted_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0 ...
14.2.3.1. Multipart uploads metrics
The multipart uploads metrics show the number of blobs uploads to storage (S3, Rados, GoogleCloudStorage, RHOCS). These can help identify issues when Quay is unable to correctly upload blobs to storage.
Metric name | Description |
---|---|
quay_multipart_uploads_started_total | Number of multipart uploads to Quay storage that started |
quay_multipart_uploads_completed_total | Number of multipart uploads to Quay storage that completed |
Sample metrics output
# TYPE quay_multipart_uploads_completed_created gauge quay_multipart_uploads_completed_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.6317823308284895e+09 ... # HELP quay_multipart_uploads_completed_total number of multipart uploads to Quay storage that completed # TYPE quay_multipart_uploads_completed_total counter quay_multipart_uploads_completed_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0 # TYPE quay_multipart_uploads_started_created gauge quay_multipart_uploads_started_created{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 1.6317823308284352e+09 ... # HELP quay_multipart_uploads_started_total number of multipart uploads to Quay storage that started # TYPE quay_multipart_uploads_started_total counter quay_multipart_uploads_started_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="208",process_name="secscan:application"} 0 ...
14.2.4. Image push / pull metrics
A number of metrics are available related to pushing and pulling images.
14.2.4.1. Image pulls total
Metric name | Description |
---|---|
quay_registry_image_pulls_total | The number of images downloaded from the registry. |
Metric labels
- protocol: the registry protocol used (should always be v2)
- ref: ref used to pull - tag, manifest
- status: http return code of the request
14.2.4.2. Image bytes pulled
Metric name | Description |
---|---|
quay_registry_image_pulled_estimated_bytes_total | The number of bytes downloaded from the registry |
Metric labels
- protocol: the registry protocol used (should always be v2)
14.2.4.3. Image pushes total
Metric name | Description |
---|---|
quay_registry_image_pushes_total | The number of images uploaded from the registry. |
Metric labels
- protocol: the registry protocol used (should always be v2)
- pstatus: http return code of the request
- pmedia_type: the uploaded manifest type
14.2.4.4. Image bytes pushed
Metric name | Description |
---|---|
quay_registry_image_pushed_bytes_total | The number of bytes uploaded to the registry |
Sample metrics output
# HELP quay_registry_image_pushed_bytes_total number of bytes pushed to the registry # TYPE quay_registry_image_pushed_bytes_total counter quay_registry_image_pushed_bytes_total{host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="221",process_name="registry:application"} 0 ...
14.2.5. Authentication metrics
The authentication metrics provide the number of authentication requests, labeled by type and whether it succeeded or not. For example, this metric could be used to monitor failed basic authentication requests.
Metric name | Description |
---|---|
quay_authentication_attempts_total | Number of authentication attempts across the registry and API |
Metric labels
auth_kind: The type of auth used, including:
- basic
- oauth
- credentials
- success: true or false
Sample metrics output
# TYPE quay_authentication_attempts_created gauge quay_authentication_attempts_created{auth_kind="basic",host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="221",process_name="registry:application",success="True"} 1.6317843039374158e+09 ... # HELP quay_authentication_attempts_total number of authentication attempts across the registry and API # TYPE quay_authentication_attempts_total counter quay_authentication_attempts_total{auth_kind="basic",host="example-registry-quay-app-6df87f7b66-9tfn6",instance="",job="quay",pid="221",process_name="registry:application",success="True"} 2 ...