Chapter 5. Troubleshooting using metrics

5.1. List of Red Hat build of Keycloak key metrics
Copy link

Self-provided metrics
JVM metrics
Database Metrics
HTTP metrics
Single site metrics (without external Data Grid)
- Clustering metrics
- Embedded Infinispan metrics for single site deployments
Multiple sites metrics (as described in Multi-site deployments)
- Embedded Infinispan metrics for multi-site deployments
- External Data Grid metrics

5.2. Self-provided metrics
Copy link

Learn about the key metrics that Red Hat build of Keycloak provides.

This is part of the Troubleshooting using metrics chapter.

5.2.1. Prerequisites
Copy link

Metrics need to be enabled for Red Hat build of Keycloak. Follow the Gaining insights with metrics chapter for more details.
A monitoring system collecting the metrics.

Metric	Description
`keycloak_user_events_total`	Counting the occurrence of user events.

Metric	Description
`keycloak_credentials_password_hashing_validations_total`	Counting password hashes validations.

5.2.3. Next steps
Copy link

Return back to the Troubleshooting using metrics or proceed to JVM metrics.

5.3. JVM metrics
Copy link

Use JVM metrics to observe performance of Red Hat build of Keycloak.

This is part of the Troubleshooting using metrics chapter.

5.3.1. Prerequisites
Copy link

Metrics need to be enabled for Red Hat build of Keycloak. Follow the Gaining insights with metrics chapter for more details.
A monitoring system collecting the metrics.

5.3.2. Metrics
Copy link

5.3.2.1. JVM info
Copy link

Expand

Metric	Description
`jvm_info_total`	Information about the JVM such as version, runtime and vendor.

5.3.2.2. Heap memory usage
Copy link

Expand

Metric	Description
`jvm_memory_committed_bytes`	The amount of memory that the JVM has committed for use, reflecting the portion of the allocated memory that is guaranteed to be available for the JVM to use.
`jvm_memory_used_bytes`	The amount of memory currently used by the JVM, indicating the actual memory consumption by the application and JVM internals.

5.3.2.3. Garbage collection
Copy link

Expand

Metric	Description
`jvm_gc_pause_seconds_max`	The maximum duration, in seconds, of garbage collection pauses experienced by the JVM due to a particular cause, which helps you quickly differentiate between types of GC (minor, major) pauses.
`jvm_gc_pause_seconds_sum`	The total cumulative time spent in garbage collection pauses, indicating the impact of GC pauses on application performance in the JVM.
`jvm_gc_pause_seconds_count`	Counts the total number of garbage collection pause events, helping to assess the frequency of GC pauses in the JVM.
`jvm_gc_overhead`	The percentage of CPU time spent on garbage collection, indicating the impact of GC on application performance in the JVM. It refers to the proportion of the total CPU processing time that is dedicated to executing garbage collection (GC) operations, as opposed to running application code or performing other tasks. This metric helps determine how much overhead GC introduces, affecting the overall performance of the Red Hat build of Keycloak’s JVM.

5.3.2.4. CPU Usage in Kubernetes
Copy link

Expand

Metric	Description
`container_cpu_usage_seconds_total`	Cumulative CPU time consumed by the container in core-seconds.

5.3.3. Next steps
Copy link

Return back to the Troubleshooting using metrics or proceed to Database Metrics.

5.4. Database Metrics
Copy link

Use metrics to describe Red Hat build of Keycloak’s connection to the database.

This is part of the Troubleshooting using metrics chapter.

5.4.1. Prerequisites
Copy link

Metrics need to be enabled for Red Hat build of Keycloak. Follow the Gaining insights with metrics chapter for more details.
A monitoring system collecting the metrics.

5.4.2. Database connection pool metrics
Copy link

Configure Red Hat build of Keycloak to use a fixed size database connection pool. See the Concepts for database connection pools chapter for more information.

Tip

If there is a high count of threads waiting for a database connection, increasing the database connection pool size is not always the best option. It might overload the database which would then become the bottleneck. Consider the following options instead:

Reduce the number of HTTP worker threads using the option http-pool-max-threads to make it match the available database connections, and thereby reduce contention and resource usage in Red Hat build of Keycloak and increase throughput.
Check which database statements are executed on the database. If you see, for example, a lot of information about clients and groups being fetched, and the users and realms cache are full, this might indicate that it is time to increase the sizes of those caches and see if this reduces your database load.

Expand

Metric	Description
`agroal_available_count`	Idle database connections.
`agroal_active_count`	Database connections used in ongoing transactions.
`agroal_awaiting_count`	Threads waiting for a database connection to become available.

5.4.3. Next steps
Copy link

Return back to the Troubleshooting using metrics or proceed to HTTP metrics.

5.5. HTTP metrics
Copy link

Use metrics to monitor the Red Hat build of Keycloak HTTP requests processing.

This is part of the Troubleshooting using metrics chapter.

5.5.1. Prerequisites
Copy link

Metrics need to be enabled for Red Hat build of Keycloak. Follow the Gaining insights with metrics chapter for more details.
A monitoring system collecting the metrics.

5.5.2. Metrics
Copy link

5.5.2.1. Processing time
Copy link

The processing time is exposed by these metrics, to monitor the Red Hat build of Keycloak performance and how long it takes to processing the requests.

Tip

On a healthy cluster, the average processing time will remain stable. Spikes or increases in the processing time may be an early sign that some node is under load.

Tags

method: HTTP method.
outcome: A more general outcome tag.
status: The HTTP status code.
uri: The requested URI.

Expand

Metric	Description
`http_server_requests_seconds_count`	The total number of requests processed.
`http_server_requests_seconds_sum`	The total duration for all the requests processed.

You can enable histograms for this metric by setting http-metrics-histograms-enabled to true, and add additional buckets for service level objectives using the option http-metrics-slos.

Note

When histograms are enabled, the percentile buckets are available. Those are useful to create heat maps and analyze latencies, still collecting and exposing the percentile buckets will increase the load of to your monitoring system.

5.5.2.2. Active requests
Copy link

The current number of active requests is also available.

Expand

Metric	Description
`http_server_active_requests`	The current number of active requests

5.5.2.3. Bandwidth
Copy link

The metrics below helps to monitor the bandwidth and consumed traffic used by Red Hat build of Keycloak and consumed by the requests and responses received or sent.

Expand

Metric	Description
`http_server_bytes_written_count`	The total number of responses sent.
`http_server_bytes_written_sum`	The total number of bytes sent.
`http_server_bytes_read_count`	The total number of requests received.
`http_server_bytes_read_sum`	The total number of bytes received.

Note

When histograms are enabled, the percentile buckets are available. Those are useful to create heat maps and analyze latencies, still collecting and exposing the percentile buckets will increase the load of to your monitoring system.

5.5.3. Next steps
Copy link

Return back to the Troubleshooting using metrics or,

For single site deployments proceed to Clustering metrics,
and for multiple sites deployments proceed to Embedded Infinispan metrics for multi-site deployments

5.5.4. Relevant options
Copy link

Expand

Value

	Value
`http-metrics-histograms-enabled` Enables a histogram with default buckets for the duration of HTTP server requests. CLI: `--http-metrics-histograms-enabled` Env: `KC_HTTP_METRICS_HISTOGRAMS_ENABLED` Available only when metrics are enabled	`true`, `false` (default)
`http-metrics-slos` Service level objectives for HTTP server requests. Use this instead of the default histogram, or use it in combination to add additional buckets. Specify a list of comma-separated values defined in milliseconds. Example with buckets from 5ms to 10s: 5,10,25,50,250,500,1000,2500,5000,10000 CLI: `--http-metrics-slos` Env: `KC_HTTP_METRICS_SLOS` Available only when metrics are enabled

http-metrics-histograms-enabled

Enables a histogram with default buckets for the duration of HTTP server requests.

CLI: --http-metrics-histograms-enabled
Env: KC_HTTP_METRICS_HISTOGRAMS_ENABLED

Available only when metrics are enabled

true, false (default)

http-metrics-slos

Service level objectives for HTTP server requests.

Use this instead of the default histogram, or use it in combination to add additional buckets. Specify a list of comma-separated values defined in milliseconds. Example with buckets from 5ms to 10s: 5,10,25,50,250,500,1000,2500,5000,10000

CLI: --http-metrics-slos
Env: KC_HTTP_METRICS_SLOS

Available only when metrics are enabled

5.6. Clustering metrics
Copy link

Use metrics to monitor communication between Red Hat build of Keycloak nodes.

This is part of the Troubleshooting using metrics chapter.

5.6.1. Prerequisites
Copy link

Metrics need to be enabled for Red Hat build of Keycloak. Follow the Gaining insights with metrics chapter for more details.
A monitoring system collecting the metrics.

5.6.2. Metrics
Copy link

Deploying multiple Red Hat build of Keycloak nodes allows the load to be distributed amongst them, but this requires communication between the nodes. This section describes metrics that are useful for monitoring the communication between Red Hat build of Keycloak in order to identify possible faults.

Note

This is relevant only for single site deployments. When multiple sites are used, as described in Multi-site deployments, Red Hat build of Keycloak nodes are not clustered together and therefore there is no communication between them directly.

Global tags

cluster=<name>: The cluster name. If metrics from multiple clusters are being collected, this tag helps identify where they belong to.
node=<node>: The name of the node reporting the metric.

Warning

All metric names prefixed with vendor_jgroups_ are provided for troubleshooting and debugging purposes only. The metric names can change in upcoming releases of Red Hat build of Keycloak without further notice. Therefore, we advise not using them in dashboards or in monitoring and alerting.

5.6.2.1. Response Time
Copy link

The following metrics expose the response time for the remote requests. The response time is measured between two nodes and includes the processing time. All requests are measured by these metrics, and the response time should remain stable through the cluster lifecycle.

Tip

In a healthy cluster, the response time will remain stable. An increase in response time may indicate a degraded cluster or a node under heavy load.

Tags

node=<node>: It identifies the sender node.
target_node=<node>: It identifies the receiver node.

Expand

Metric	Description
`vendor_jgroups_stats_sync_requests_seconds_count`	The number of synchronous requests to a receiver node.
`vendor_jgroups_stats_sync_requests_seconds_sum`	The total duration of synchronous request to a receiver node

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.6.2.2. Bandwidth
Copy link

All the bytes received and sent by the Red Hat build of Keycloak are collected by these metrics. Also, all the internal messages, as heartbeats, are counted too. They allow computing the bandwidth currently used by each node.

Important

The metric name depends on the JGroups transport protocol in use.

Expand

Metric	Protocol	Description
`vendor_jgroups_tcp_get_num_bytes_received`	`TCP`	The total number of bytes received by a node.
`vendor_jgroups_udp_get_num_bytes_received`	`UDP`
`vendor_jgroups_tunnel_get_num_bytes_received`	`TUNNEL`
`vendor_jgroups_tcp_get_num_bytes_sent`	`TCP`	The total number of bytes sent by a node.
`vendor_jgroups_udp_get_num_bytes_sent`	`UDP`
`vendor_jgroups_tunnel_get_num_bytes_sent`	`TUNNEL`

5.6.2.3. Thread Pool
Copy link

Monitoring the thread pool size is a good indicator that a node is under a heavy load. All requests received are added to the thread pool for processing and, when it is full, the request is discarded. A retransmission mechanism ensures a reliable communication with an increase of resource usage.

Tip

In a healthy cluster, the thread pool should never be closer to its maximum size (by default, 200 threads).

Note

Thread pool metrics are not available with virtual threads. Virtual threads are enabled by default when running with OpenJDK 21.

Important

The metric name depends on the JGroups transport protocol in use. The default transport protocol is TCP.

Expand

Metric	Protocol	Description
`vendor_jgroups_tcp_get_thread_pool_size`	`TCP`	Current number of threads in the thread pool.
`vendor_jgroups_udp_get_thread_pool_size`	`UDP`
`vendor_jgroups_tunnel_get_thread_pool_size`	`TUNNEL`
`vendor_jgroups_tcp_get_largest_size`	`TCP`	The largest number of threads that have ever simultaneously been in the pool.
`vendor_jgroups_udp_get_largest_size`	`UDP`
`vendor_jgroups_tunnel_get_largest_size`	`TUNNEL`

5.6.2.4. Flow Control
Copy link

Flow control takes care of adjusting the rate of a message sender to the rate of the slowest receiver over time. This is implemented through a credit-based system, where each sender decrements its credits when sending. The sender blocks when the credits fall below 0, and only resumes sending messages when it receives a replenishment message from the receivers.

The metrics below show the number of blocked messages and the average blocking time. When a value is different from zero, it may signal that a receiver is overloaded and may degrade the cluster performance.

Each node has two independent flow control protocols, UFC for unicast messages and MFC for multicast messages.

Tip

A healthy cluster shows a value of zero for all metrics.

Expand

Metric	Description
`vendor_jgroups_ufc_get_number_of_blockings`	The number of times flow control blocks the sender for unicast messages.
`vendor_jgroups_ufc_get_average_time_blocked`	Average time blocked (in ms) in flow control when trying to send an unicast message.
`vendor_jgroups_mfc_get_number_of_blockings`	The number of times flow control blocks the sender for multicast messages.
`vendor_jgroups_mfc_get_average_time_blocked`	Average time blocked (in ms) in flow control when trying to send a multicast message.

5.6.2.5. Retransmissions
Copy link

JGroups provides reliable delivery of messages. When a message is dropped on the network, or the receiver cannot handle the message, a retransmission is required. Retransmissions increase resource usage, and it is usually a signal of an overload system.

Random Early Drop (RED) monitors the sender queues. When the queues are almost full, the message is dropped, and a retransmission must happen. It prevents threads from being blocked by a full sender queue.

Tip

A healthy cluster shows a value of zero for all metrics.

Expand

Metric	Description
`vendor_jgroups_unicast3_get_num_xmits`	The number of retransmitted messages.
`vendor_jgroups_red_get_dropped_messages`	The total number of dropped messages by the sender.
`vendor_jgroups_red_get_drop_rate`	Percentage of all messages that were dropped by the sender.

5.6.2.6. Network Partitions
Copy link

5.6.2.6.1. Cluster Size
Copy link

The cluster size metric reports the number of nodes present in the cluster. If it differs, it may signal that a node is joining, shutdown or, in the worst case, a network partition is happening.

Tip

A healthy cluster shows the same value in all nodes.

Expand

Metric	Description
`vendor_cluster_size`	The number of nodes in the cluster.

5.6.2.6.2. Network Partition Events
Copy link

Network partitions in a cluster can happen due to various reasons. This metrics does not help predict network splits but signals that it happened, and the cluster has been merged.

Tip

A healthy cluster shows a value of zero for this metric.

Expand

Metric	Description
`vendor_jgroups_merge3_get_num_merge_events`	The amount of time a network split was detected and healed.

5.6.3. Next steps
Copy link

Return back to the Troubleshooting using metrics or proceed to Embedded Infinispan metrics for single site deployments.

5.7. Embedded Infinispan metrics for single site deployments
Copy link

Use metrics to monitor caching health and cluster replication.

This is part of the Troubleshooting using metrics chapter.

5.7.1. Prerequisites
Copy link

Metrics need to be enabled for Red Hat build of Keycloak. Follow the Gaining insights with metrics chapter for more details.
A monitoring system collecting the metrics.

5.7.2. Metrics
Copy link

Global tags

cache=<name>: The cache name.

5.7.2.1. Size
Copy link

Monitor the number of entries in your cache using these two metrics. If the cache is clustered, each entry has an owner node and zero or more backup copies of different nodes.

Tip

Sum the unique entry size metric to get a cluster total number of entries.

Expand

Metric	Description
`vendor_statistics_approximate_entries`	The approximate number of entries stored by the node, including backup copies.
`vendor_statistics_approximate_entries_unique`	The approximate number of entries stored by the node, excluding backup copies.

5.7.2.2. Data Access
Copy link

The following metrics monitor the cache accesses, such as the reads, writes and their duration.

5.7.2.2.1. Stores
Copy link

A store operation is a write operation that writes or updates a value stored in the cache.

Expand

Metric	Description
`vendor_statistics_store_times_seconds_count`	The total number of store requests.
`vendor_statistics_store_times_seconds_sum`	The total duration of all store requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.7.2.2.2. Reads
Copy link

A read operation reads a value from the cache. It divides into two groups, a hit if a value is found, and a miss if not found.

Expand

Metric	Description
`vendor_statistics_hit_times_seconds_count`	The total number of read hits requests.
`vendor_statistics_hit_times_seconds_sum`	The total duration of all read hits requests.
`vendor_statistics_miss_times_seconds_count`	The total number of read misses requests.
`vendor_statistics_miss_times_seconds_sum`	The total duration of all read misses requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.7.2.2.3. Removes
Copy link

A remove operation removes a value from the cache. It divides in two groups, a hit if a value exists, and a miss if the value does not exist.

Expand

Metric	Description
`vendor_statistics_remove_hit_times_seconds_count`	The total number of remove hits requests.
`vendor_statistics_remove_hit_times_seconds_sum`	The total duration of all remove hits requests.
`vendor_statistics_remove_miss_times_seconds_count`	The total number of remove misses requests.
`vendor_statistics_remove_miss_times_seconds_sum`	The total duration of all remove misses requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

Tip

For users and realms cache, the database invalidation translates into a remove operation. These metrics are a good indicator of how frequent the database entities are modified and therefore removed from the cache.

Hit Ratio for read and remove operations

An expression can be used to compute the hit ratio for a cache in systems such as Prometheus. As an example, the hit ratio for read operations can be expressed as:

vendor_statistics_hit_times_seconds_count
/
(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count)

vendor_statistics_hit_times_seconds_count
/
(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count)

Copy to Clipboard

Toggle word wrap

Read/Write ratio

An expression can be used to compute the read-write ratio for a cache, using the metrics above:

(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count)
/
(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count
 + vendor_statistics_remove_hit_times_seconds_count
 + vendor_statistics_remove_miss_times_seconds_count
 + vendor_statistics_store_times_seconds_count)

(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count)
/
(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count
 + vendor_statistics_remove_hit_times_seconds_count
 + vendor_statistics_remove_miss_times_seconds_count
 + vendor_statistics_store_times_seconds_count)

Copy to Clipboard

Toggle word wrap

5.7.2.2.4. Eviction
Copy link

Eviction is the process to limit the cache size and, when full, an entry is removed to make room for a new entry to be cached. As Red Hat build of Keycloak caches the database entities in the users, realms and authorization, database access always proceeds with an eviction event.

Expand

Metric	Description
`vendor_statistics_evictions`	The total number of eviction events.

Eviction rate

A rapid increase of eviction and very high database CPU usage means the users or realms cache is too small for smooth Red Hat build of Keycloak operation, as data needs to be re-loaded very often from the database which slows down responses. If enough memory is available, consider increasing the max cache size using the CLI options cache-embedded-users-max-count or cache-embedded-realms-max-count

5.7.2.3. Locking
Copy link

Write and remove operations hold the lock until the value is replicated in the local cluster and to the remote site.

Tip

On a healthy cluster, the number of locks held should remain constant, but deadlocks may create temporary spikes.

Expand

Metric	Description
`vendor_lock_manager_number_of_locks_held`	The number of locks currently being held by this node.

5.7.2.4. Transactions
Copy link

Transactional caches use both One-Phase-Commit and Two-Phase-Commit protocols to complete a transaction. These metrics keep track of the operation duration.

Note

The PESSMISTIC locking mode uses One-Phase-Commit and does not create commit requests.

Tip

In a healthy cluster, the number of rollbacks should remain zero. Deadlocks should be rare, but they increase the number of rollbacks.

Expand

Metric	Description
`vendor_transactions_prepare_times_seconds_count`	The total number of prepare requests.
`vendor_transactions_prepare_times_seconds_sum`	The total duration of all prepare requests.
`vendor_transactions_rollback_times_seconds_count`	The total number of rollback requests.
`vendor_transactions_rollback_times_seconds_sum`	The total duration of all rollback requests.
`vendor_transactions_commit_times_seconds_count`	The total number of commit requests.
`vendor_transactions_commit_times_seconds_sum`	The total duration of all commit requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.7.2.5. State Transfer
Copy link

State transfer happens when a node joins or leaves the cluster. It is required to balance the data stored and guarantee the desired number of copies.

This operation increases the resource usage, and it will affect negatively the overall performance.

Expand

Metric	Description
`vendor_state_transfer_manager_inflight_transactional_segment_count`	The number of in-flight transactional segments the local node requested from other nodes.
`vendor_state_transfer_manager_inflight_segment_transfer_count`	The number of in-flight segments the local node requested from other nodes.

5.7.2.6. Cluster Data Replication
Copy link

The cluster data replication can be the main source of failure. These metrics not only report the response time, i.e., the time it takes to replicate an update, but also the failures.

Tip

On a healthy cluster, the average replication time will be stable or with little variance. The number of failures should not increase.

Expand

Metric	Description
`vendor_rpc_manager_replication_count`	The total number of successful replications.
`vendor_rpc_manager_replication_failures`	The total number of failed replications.
`vendor_rpc_manager_average_replication_time`	The average time spent, in milliseconds, replicating data in the cluster.

Success ratio

An expression can be used to compute the replication success ratio:

(vendor_rpc_manager_replication_count)
/
(vendor_rpc_manager_replication_count
 + vendor_rpc_manager_replication_failures)

(vendor_rpc_manager_replication_count)
/
(vendor_rpc_manager_replication_count
 + vendor_rpc_manager_replication_failures)

Copy to Clipboard

Toggle word wrap

5.7.3. Next steps
Copy link

Return back to the Troubleshooting using metrics.

5.8. Embedded Infinispan metrics for multi-site deployments
Copy link

Use metrics to monitor caching health.

This is part of the Troubleshooting using metrics chapter.

5.8.1. Prerequisites
Copy link

Metrics need to be enabled for Red Hat build of Keycloak. Follow the Gaining insights with metrics chapter for more details.
A monitoring system collecting the metrics.

5.8.2. Metrics
Copy link

Global tags

cache=<name>: The cache name.

5.8.2.1. Size
Copy link

Monitor the number of entries in your cache using these two metrics. If the cache is clustered, each entry has an owner node and zero or more backup copies of different nodes.

Tip

Sum the unique entry size metric to get a cluster total number of entries.

Expand

Metric	Description
`vendor_statistics_approximate_entries`	The approximate number of entries stored by the node, including backup copies.
`vendor_statistics_approximate_entries_unique`	The approximate number of entries stored by the node, excluding backup copies.

5.8.2.2. Data Access
Copy link

The following metrics monitor the cache accesses, such as the reads, writes and their duration.

5.8.2.2.1. Stores
Copy link

A store operation is a write operation that writes or updates a value stored in the cache.

Expand

Metric	Description
`vendor_statistics_store_times_seconds_count`	The total number of store requests.
`vendor_statistics_store_times_seconds_sum`	The total duration of all store requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.8.2.2.2. Reads
Copy link

A read operation reads a value from the cache. It divides into two groups, a hit if a value is found, and a miss if not found.

Expand

Metric	Description
`vendor_statistics_hit_times_seconds_count`	The total number of read hits requests.
`vendor_statistics_hit_times_seconds_sum`	The total duration of all read hits requests.
`vendor_statistics_miss_times_seconds_count`	The total number of read misses requests.
`vendor_statistics_miss_times_seconds_sum`	The total duration of all read misses requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.8.2.2.3. Removes
Copy link

A remove operation removes a value from the cache. It divides in two groups, a hit if a value exists, and a miss if the value does not exist.

Expand

Metric	Description
`vendor_statistics_remove_hit_times_seconds_count`	The total number of remove hits requests.
`vendor_statistics_remove_hit_times_seconds_sum`	The total duration of all remove hits requests.
`vendor_statistics_remove_miss_times_seconds_count`	The total number of remove misses requests.
`vendor_statistics_remove_miss_times_seconds_sum`	The total duration of all remove misses requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

Tip

For users and realms cache, the database invalidation translates into a remove operation. These metrics are a good indicator of how frequent the database entities are modified and therefore removed from the cache.

Hit Ratio for read and remove operations

An expression can be used to compute the hit ratio for a cache in systems such as Prometheus. As an example, the hit ratio for read operations can be expressed as:

vendor_statistics_hit_times_seconds_count
/
(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count)

vendor_statistics_hit_times_seconds_count
/
(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count)

Copy to Clipboard

Toggle word wrap

Read/Write ratio

An expression can be used to compute the read-write ratio for a cache, using the metrics above:

(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count)
/
(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count
 + vendor_statistics_remove_hit_times_seconds_count
 + vendor_statistics_remove_miss_times_seconds_count
 + vendor_statistics_store_times_seconds_count)

(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count)
/
(vendor_statistics_hit_times_seconds_count
 + vendor_statistics_miss_times_seconds_count
 + vendor_statistics_remove_hit_times_seconds_count
 + vendor_statistics_remove_miss_times_seconds_count
 + vendor_statistics_store_times_seconds_count)

Copy to Clipboard

Toggle word wrap

5.8.2.2.4. Eviction
Copy link

Eviction is the process to limit the cache size and, when full, an entry is removed to make room for a new entry to be cached. As Red Hat build of Keycloak caches the database entities in the users, realms and authorization, database access always proceeds with an eviction event.

Expand

Metric	Description
`vendor_statistics_evictions`	The total number of eviction events.

Eviction rate

A rapid increase of eviction and very high database CPU usage means the users or realms cache is too small for smooth Red Hat build of Keycloak operation, as data needs to be re-loaded very often from the database which slows down responses. If enough memory is available, consider increasing the max cache size using the CLI options cache-embedded-users-max-count or cache-embedded-realms-max-count

5.8.2.3. Transactions
Copy link

Transactional caches use both One-Phase-Commit and Two-Phase-Commit protocols to complete a transaction. These metrics keep track of the operation duration.

Note

The PESSMISTIC locking mode uses One-Phase-Commit and does not create commit requests.

Tip

In a healthy cluster, the number of rollbacks should remain zero. Deadlocks should be rare, but they increase the number of rollbacks.

Expand

Metric	Description
`vendor_transactions_prepare_times_seconds_count`	The total number of prepare requests.
`vendor_transactions_prepare_times_seconds_sum`	The total duration of all prepare requests.
`vendor_transactions_rollback_times_seconds_count`	The total number of rollback requests.
`vendor_transactions_rollback_times_seconds_sum`	The total duration of all rollback requests.
`vendor_transactions_commit_times_seconds_count`	The total number of commit requests.
`vendor_transactions_commit_times_seconds_sum`	The total duration of all commit requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.8.3. Next steps
Copy link

Return back to the Troubleshooting using metrics or proceed to External Data Grid metrics.

5.9. External Data Grid metrics
Copy link

Use metrics to monitor external Data Grid performance.

This is part of the Troubleshooting using metrics chapter.

5.9.1. Prerequisites
Copy link

5.9.1.1. Enabled Data Grid server metrics
Copy link

Data Grid exposes metrics in the endpoint /metrics. By default, they are enabled. We recommend enabling the attribute name-as-tags as it makes the metrics name independent on the cache name.

To configure metrics in the Data Grid server, just enabled as shown in the XML below.

infinispan.xml

<infinispan>
    <cache-container statistics="true">
        <metrics gauges="true" histograms="false" name-as-tags="true" />
    </cache-container>
</infinispan>

<infinispan>
    <cache-container statistics="true">
        <metrics gauges="true" histograms="false" name-as-tags="true" />
    </cache-container>
</infinispan>

Copy to Clipboard

Toggle word wrap

Using the Data Grid Operator in Kubernetes, metrics can be enabled by using a ConfigMap with a custom configuration. It is shown below an example.

ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-config
data:
  infinispan-config.yaml: >
    infinispan:
      cacheContainer:
        metrics:
          gauges: true
          namesAsTags: true
          histograms: false

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-config
data:
  infinispan-config.yaml: >
    infinispan:
      cacheContainer:
        metrics:
          gauges: true
          namesAsTags: true
          histograms: false

Copy to Clipboard

Toggle word wrap

infinispan.yaml CR

apiVersion: infinispan.org/v1
kind: Infinispan
metadata:
  name: infinispan
  annotations:
    infinispan.org/monitoring: 'true' 
spec:
  configMapName: "cluster-config"

apiVersion: infinispan.org/v1
kind: Infinispan
metadata:
  name: infinispan
  annotations:
    infinispan.org/monitoring: 'true'

1


spec:
  configMapName: "cluster-config"

2

Copy to Clipboard

Toggle word wrap

1: Enables monitoring for the deployment
2: Sets the ConfigMap name with the custom configuration.

Additional information can be found in the Infinispan documentation and Infinispan operator documentation.

5.9.2. Clustering and Network
Copy link

This section describes metrics that are useful for monitoring the communication between Data Grid nodes to identify possible network issues.

Global tags

cluster=<name>: The cluster name. If metrics from multiple clusters are being collected, this tag helps identify where they belong to.
node=<node>: The name of the node reporting the metric.

Warning

All metric names prefixed with vendor_jgroups_ are provided for troubleshooting and debugging purposes only. The metric names can change in upcoming releases of Red Hat build of Keycloak without further notice. Therefore, we advise not using them in dashboards or in monitoring and alerting.

5.9.2.1. Response Time
Copy link

The following metrics expose the response time for the remote requests. The response time is measured between two nodes and includes the processing time. All requests are measured by these metrics, and the response time should remain stable through the cluster lifecycle.

Tip

In a healthy cluster, the response time will remain stable. An increase in response time may indicate a degraded cluster or a node under heavy load.

Tags

node=<node>: It identifies the sender node.
target_node=<node>: It identifies the receiver node.

Expand

Metric	Description
`vendor_jgroups_stats_sync_requests_seconds_count`	The number of synchronous requests to a receiver node.
`vendor_jgroups_stats_sync_requests_seconds_sum`	The total duration of synchronous request to a receiver node

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.9.2.2. Bandwidth
Copy link

All the bytes received and sent by the Data Grid are collected by these metrics. Also, all the internal messages, as heartbeats, are counted too. They allow computing the bandwidth currently used by each node.

Important

The metric name depends on the JGroups transport protocol in use.

Expand

Metric	Protocol	Description
`vendor_jgroups_tcp_get_num_bytes_received`	`TCP`	The total number of bytes received by a node.
`vendor_jgroups_udp_get_num_bytes_received`	`UDP`
`vendor_jgroups_tunnel_get_num_bytes_received`	`TUNNEL`
`vendor_jgroups_tcp_get_num_bytes_sent`	`TCP`	The total number of bytes sent by a node.
`vendor_jgroups_udp_get_num_bytes_sent`	`UDP`
`vendor_jgroups_tunnel_get_num_bytes_sent`	`TUNNEL`

5.9.2.3. Thread Pool
Copy link

Monitoring the thread pool size is a good indicator that a node is under a heavy load. All requests received are added to the thread pool for processing and, when it is full, the request is discarded. A retransmission mechanism ensures a reliable communication with an increase of resource usage.

Tip

In a healthy cluster, the thread pool should never be closer to its maximum size (by default, 200 threads).

Note

Thread pool metrics are not available with virtual threads. Virtual threads are enabled by default when running with OpenJDK 21.

Important

The metric name depends on the JGroups transport protocol in use. The default transport protocol is TCP.

Expand

Metric	Protocol	Description
`vendor_jgroups_tcp_get_thread_pool_size`	`TCP`	Current number of threads in the thread pool.
`vendor_jgroups_udp_get_thread_pool_size`	`UDP`
`vendor_jgroups_tunnel_get_thread_pool_size`	`TUNNEL`
`vendor_jgroups_tcp_get_largest_size`	`TCP`	The largest number of threads that have ever simultaneously been in the pool.
`vendor_jgroups_udp_get_largest_size`	`UDP`
`vendor_jgroups_tunnel_get_largest_size`	`TUNNEL`

5.9.2.4. Flow Control
Copy link

Flow control takes care of adjusting the rate of a message sender to the rate of the slowest receiver over time. This is implemented through a credit-based system, where each sender decrements its credits when sending. The sender blocks when the credits fall below 0, and only resumes sending messages when it receives a replenishment message from the receivers.

The metrics below show the number of blocked messages and the average blocking time. When a value is different from zero, it may signal that a receiver is overloaded and may degrade the cluster performance.

Each node has two independent flow control protocols, UFC for unicast messages and MFC for multicast messages.

Tip

A healthy cluster shows a value of zero for all metrics.

Expand

Metric	Description
`vendor_jgroups_ufc_get_number_of_blockings`	The number of times flow control blocks the sender for unicast messages.
`vendor_jgroups_ufc_get_average_time_blocked`	Average time blocked (in ms) in flow control when trying to send an unicast message.
`vendor_jgroups_mfc_get_number_of_blockings`	The number of times flow control blocks the sender for multicast messages.
`vendor_jgroups_mfc_get_average_time_blocked`	Average time blocked (in ms) in flow control when trying to send a multicast message.

5.9.2.5. Retransmissions
Copy link

JGroups provides reliable delivery of messages. When a message is dropped on the network, or the receiver cannot handle the message, a retransmission is required. Retransmissions increase resource usage, and it is usually a signal of an overload system.

Random Early Drop (RED) monitors the sender queues. When the queues are almost full, the message is dropped, and a retransmission must happen. It prevents threads from being blocked by a full sender queue.

Tip

A healthy cluster shows a value of zero for all metrics.

Expand

Metric	Description
`vendor_jgroups_unicast3_get_num_xmits`	The number of retransmitted messages.
`vendor_jgroups_red_get_dropped_messages`	The total number of dropped messages by the sender.
`vendor_jgroups_red_get_drop_rate`	Percentage of all messages that were dropped by the sender.

5.9.2.6. Network Partitions
Copy link

5.9.2.6.1. Cluster Size
Copy link

The cluster size metric reports the number of nodes present in the cluster. If it differs, it may signal that a node is joining, shutdown or, in the worst case, a network partition is happening.

Tip

A healthy cluster shows the same value in all nodes.

Expand

Metric	Description
`vendor_cluster_size`	The number of nodes in the cluster.

5.9.2.6.2. Cross-Site Status
Copy link

The cross-site status reports connection status to the other site. It returns a value of 1 if is online or 0 if offline. The value of 2 is used on nodes where the status is unknown; not all nodes establish connections to the remote sites and do not contain this information.

Tip

A healthy cluster shows a value greater than zero.

Expand

Metric	Description
`vendor_jgroups_site_view_status`	The single site status (1 if online).

Tags

site=<name>: The name of the destination site.

5.9.2.6.3. Network Partition Events
Copy link

Network partitions in a cluster can happen due to various reasons. This metrics does not help predict network splits but signals that it happened, and the cluster has been merged.

Tip

A healthy cluster shows a value of zero for this metric.

Expand

Metric	Description
`vendor_jgroups_merge3_get_num_merge_events`	The amount of time a network split was detected and healed.

5.9.3. Data Grid Caches
Copy link

The metrics in this section help monitoring the Data Grid caches health and the cluster replication.

Global tags

cache=<name>: The cache name.

5.9.3.1. Size
Copy link

Monitor the number of entries in your cache using these two metrics. If the cache is clustered, each entry has an owner node and zero or more backup copies of different nodes.

Tip

Sum the unique entry size metric to get a cluster total number of entries.

Expand

Metric	Description
`vendor_statistics_approximate_entries`	The approximate number of entries stored by the node, including backup copies.
`vendor_statistics_approximate_entries_unique`	The approximate number of entries stored by the node, excluding backup copies.

5.9.3.2. Data Access
Copy link

The following metrics monitor the cache accesses, such as the reads, writes and their duration.

5.9.3.2.1. Stores
Copy link

A store operation is a write operation that writes or updates a value stored in the cache.

Expand

Metric	Description
`vendor_statistics_store_times_seconds_count`	The total number of store requests.
`vendor_statistics_store_times_seconds_sum`	The total duration of all store requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.9.3.2.2. Reads
Copy link

A read operation reads a value from the cache. It divides into two groups, a hit if a value is found, and a miss if not found.

Expand

Metric	Description
`vendor_statistics_hit_times_seconds_count`	The total number of read hits requests.
`vendor_statistics_hit_times_seconds_sum`	The total duration of all read hits requests.
`vendor_statistics_miss_times_seconds_count`	The total number of read misses requests.
`vendor_statistics_miss_times_seconds_sum`	The total duration of all read misses requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.9.3.2.3. Removes
Copy link

A remove operation removes a value from the cache. It divides in two groups, a hit if a value exists, and a miss if the value does not exist.

Expand

Metric	Description
`vendor_statistics_remove_hit_times_seconds_count`	The total number of remove hits requests.
`vendor_statistics_remove_hit_times_seconds_sum`	The total duration of all remove hits requests.
`vendor_statistics_remove_miss_times_seconds_count`	The total number of remove misses requests.
`vendor_statistics_remove_miss_times_seconds_sum`	The total duration of all remove misses requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.9.3.3. Locking
Copy link

Write and remove operations hold the lock until the value is replicated in the local cluster and to the remote site.

Tip

On a healthy cluster, the number of locks held should remain constant, but deadlocks may create temporary spikes.

Expand

Metric	Description
`vendor_lock_manager_number_of_locks_held`	The number of locks currently being held by this node.

5.9.3.4. Transactions
Copy link

Transactional caches use both One-Phase-Commit and Two-Phase-Commit protocols to complete a transaction. These metrics keep track of the operation duration.

Note

The PESSMISTIC locking mode uses One-Phase-Commit and does not create commit requests.

Tip

In a healthy cluster, the number of rollbacks should remain zero. Deadlocks should be rare, but they increase the number of rollbacks.

Expand

Metric	Description
`vendor_transactions_prepare_times_seconds_count`	The total number of prepare requests.
`vendor_transactions_prepare_times_seconds_sum`	The total duration of all prepare requests.
`vendor_transactions_rollback_times_seconds_count`	The total number of rollback requests.
`vendor_transactions_rollback_times_seconds_sum`	The total duration of all rollback requests.
`vendor_transactions_commit_times_seconds_count`	The total number of commit requests.
`vendor_transactions_commit_times_seconds_sum`	The total duration of all commit requests.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.9.3.5. State Transfer
Copy link

State transfer happens when a node joins or leaves the cluster. It is required to balance the data stored and guarantee the desired number of copies.

This operation increases the resource usage, and it will affect negatively the overall performance.

Expand

Metric	Description
`vendor_state_transfer_manager_inflight_transactional_segment_count`	The number of in-flight transactional segments the local node requested from other nodes.
`vendor_state_transfer_manager_inflight_segment_transfer_count`	The number of in-flight segments the local node requested from other nodes.

5.9.3.6. Cluster Data Replication
Copy link

The cluster data replication can be the main source of failure. These metrics not only report the response time, i.e., the time it takes to replicate an update, but also the failures.

Tip

On a healthy cluster, the average replication time will be stable or with little variance. The number of failures should not increase.

Expand

Metric	Description
`vendor_rpc_manager_replication_count`	The total number of successful replications.
`vendor_rpc_manager_replication_failures`	The total number of failed replications.
`vendor_rpc_manager_average_replication_time`	The average time spent, in milliseconds, replicating data in the cluster.

Success ratio

An expression can be used to compute the replication success ratio:

(vendor_rpc_manager_replication_count)
/
(vendor_rpc_manager_replication_count
 + vendor_rpc_manager_replication_failures)

(vendor_rpc_manager_replication_count)
/
(vendor_rpc_manager_replication_count
 + vendor_rpc_manager_replication_failures)

Copy to Clipboard

Toggle word wrap

5.9.3.7. Cross Site Data Replication
Copy link

Like cluster data replication, the metrics in this section measure the time it takes to replicate the data to the other sites.

Tip

On a healthy cluster, the average cross-site replication time will be stable or with little variance.

Tags

site=<name>: indicates the receiving site.

Expand

Metric	Description
`vendor_rpc_manager_cross_site_replication_times_seconds_count`	The total number of cross-site requests.
`vendor_rpc_manager_cross_site_replication_times_seconds_sum`	The total duration of all cross-site requests.
`vendor_rpc_manager_replication_times_to_site_seconds_count`	The total number of cross-site requests. This metric is more detailed with a per-site counter.
`vendor_rpc_manager_replication_times_to_site_seconds_sum`	The total duration of all cross-site requests. This metric is more detailed with a per-site duration.
`vendor_rpc_manager_number_xsite_requests_received_from_site`	The total number of cross-site requests handled by this node. This metric is more detailed with a per-site counter.
`vendor_x_site_admin_status`	The site status. A value of 1 indicates that it is online. This value reacts to the Data Grid CLI commands `bring-online` and `take-offline`.

Note

When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance.

5.9.4. Next steps
Copy link

Return back to the Troubleshooting using metrics.

5.1. List of Red Hat build of Keycloak key metricsCopy linkLink copied to clipboard!

5.2. Self-provided metricsCopy linkLink copied to clipboard!

5.2.1. PrerequisitesCopy linkLink copied to clipboard!

5.2.2. MetricsCopy linkLink copied to clipboard!

5.2.2.1. User Event MetricsCopy linkLink copied to clipboard!

5.2.2.2. Password hashingCopy linkLink copied to clipboard!

5.2.3. Next stepsCopy linkLink copied to clipboard!

5.3. JVM metricsCopy linkLink copied to clipboard!

5.3.1. PrerequisitesCopy linkLink copied to clipboard!

5.3.2. MetricsCopy linkLink copied to clipboard!

5.3.2.1. JVM infoCopy linkLink copied to clipboard!

5.3.2.2. Heap memory usageCopy linkLink copied to clipboard!

5.3.2.3. Garbage collectionCopy linkLink copied to clipboard!

5.3.2.4. CPU Usage in KubernetesCopy linkLink copied to clipboard!

5.3.3. Next stepsCopy linkLink copied to clipboard!

5.4. Database MetricsCopy linkLink copied to clipboard!

5.4.1. PrerequisitesCopy linkLink copied to clipboard!

5.4.2. Database connection pool metricsCopy linkLink copied to clipboard!

5.4.3. Next stepsCopy linkLink copied to clipboard!

5.5. HTTP metricsCopy linkLink copied to clipboard!

5.5.1. PrerequisitesCopy linkLink copied to clipboard!

5.5.2. MetricsCopy linkLink copied to clipboard!

5.5.2.1. Processing timeCopy linkLink copied to clipboard!

5.5.2.2. Active requestsCopy linkLink copied to clipboard!

5.5.2.3. BandwidthCopy linkLink copied to clipboard!

5.5.3. Next stepsCopy linkLink copied to clipboard!

5.5.4. Relevant optionsCopy linkLink copied to clipboard!

5.6. Clustering metricsCopy linkLink copied to clipboard!

5.6.1. PrerequisitesCopy linkLink copied to clipboard!

5.6.2. MetricsCopy linkLink copied to clipboard!

5.6.2.1. Response TimeCopy linkLink copied to clipboard!

5.6.2.2. BandwidthCopy linkLink copied to clipboard!

5.6.2.3. Thread PoolCopy linkLink copied to clipboard!

5.6.2.4. Flow ControlCopy linkLink copied to clipboard!

5.6.2.5. RetransmissionsCopy linkLink copied to clipboard!

5.6.2.6. Network PartitionsCopy linkLink copied to clipboard!

5.6.2.6.1. Cluster SizeCopy linkLink copied to clipboard!

5.6.2.6.2. Network Partition EventsCopy linkLink copied to clipboard!

5.6.3. Next stepsCopy linkLink copied to clipboard!

5.7. Embedded Infinispan metrics for single site deploymentsCopy linkLink copied to clipboard!

5.7.1. PrerequisitesCopy linkLink copied to clipboard!

5.7.2. MetricsCopy linkLink copied to clipboard!

5.7.2.1. SizeCopy linkLink copied to clipboard!

5.7.2.2. Data AccessCopy linkLink copied to clipboard!

5.7.2.2.1. StoresCopy linkLink copied to clipboard!

5.7.2.2.2. ReadsCopy linkLink copied to clipboard!

5.7.2.2.3. RemovesCopy linkLink copied to clipboard!

5.7.2.2.4. EvictionCopy linkLink copied to clipboard!

5.7.2.3. LockingCopy linkLink copied to clipboard!

5.7.2.4. TransactionsCopy linkLink copied to clipboard!

5.7.2.5. State TransferCopy linkLink copied to clipboard!

5.7.2.6. Cluster Data ReplicationCopy linkLink copied to clipboard!

5.7.3. Next stepsCopy linkLink copied to clipboard!

5.8. Embedded Infinispan metrics for multi-site deploymentsCopy linkLink copied to clipboard!

5.8.1. PrerequisitesCopy linkLink copied to clipboard!

5.8.2. MetricsCopy linkLink copied to clipboard!

5.8.2.1. SizeCopy linkLink copied to clipboard!

5.8.2.2. Data AccessCopy linkLink copied to clipboard!

5.8.2.2.1. StoresCopy linkLink copied to clipboard!

5.8.2.2.2. ReadsCopy linkLink copied to clipboard!

5.8.2.2.3. RemovesCopy linkLink copied to clipboard!

5.8.2.2.4. EvictionCopy linkLink copied to clipboard!

5.8.2.3. TransactionsCopy linkLink copied to clipboard!

5.8.3. Next stepsCopy linkLink copied to clipboard!

5.9. External Data Grid metricsCopy linkLink copied to clipboard!

5.9.1. PrerequisitesCopy linkLink copied to clipboard!

5.9.1.1. Enabled Data Grid server metricsCopy linkLink copied to clipboard!

5.9.2. Clustering and NetworkCopy linkLink copied to clipboard!

5.9.2.1. Response TimeCopy linkLink copied to clipboard!

5.9.2.2. BandwidthCopy linkLink copied to clipboard!

5.9.2.3. Thread PoolCopy linkLink copied to clipboard!

5.9.2.4. Flow ControlCopy linkLink copied to clipboard!

5.9.2.5. RetransmissionsCopy linkLink copied to clipboard!

5.9.2.6. Network PartitionsCopy linkLink copied to clipboard!

5.9.2.6.1. Cluster SizeCopy linkLink copied to clipboard!

5.9.2.6.2. Cross-Site StatusCopy linkLink copied to clipboard!

5.9.2.6.3. Network Partition EventsCopy linkLink copied to clipboard!

5.9.3. Data Grid CachesCopy linkLink copied to clipboard!

5.9.3.1. SizeCopy linkLink copied to clipboard!

5.1. List of Red Hat build of Keycloak key metrics
Copy link

5.2. Self-provided metrics
Copy link

5.2.1. Prerequisites
Copy link

5.2.2. Metrics
Copy link

5.2.2.1. User Event Metrics
Copy link

5.2.2.2. Password hashing
Copy link

5.2.3. Next steps
Copy link

5.3. JVM metrics
Copy link

5.3.1. Prerequisites
Copy link

5.3.2. Metrics
Copy link

5.3.2.1. JVM info
Copy link

5.3.2.2. Heap memory usage
Copy link

5.3.2.3. Garbage collection
Copy link

5.3.2.4. CPU Usage in Kubernetes
Copy link

5.3.3. Next steps
Copy link

5.4. Database Metrics
Copy link

5.4.1. Prerequisites
Copy link

5.4.2. Database connection pool metrics
Copy link

5.4.3. Next steps
Copy link

5.5. HTTP metrics
Copy link

5.5.1. Prerequisites
Copy link

5.5.2. Metrics
Copy link

5.5.2.1. Processing time
Copy link

5.5.2.2. Active requests
Copy link

5.5.2.3. Bandwidth
Copy link

5.5.3. Next steps
Copy link

5.5.4. Relevant options
Copy link

5.6. Clustering metrics
Copy link

5.6.1. Prerequisites
Copy link

5.6.2. Metrics
Copy link

5.6.2.1. Response Time
Copy link

5.6.2.2. Bandwidth
Copy link

5.6.2.3. Thread Pool
Copy link

5.6.2.4. Flow Control
Copy link

5.6.2.5. Retransmissions
Copy link

5.6.2.6. Network Partitions
Copy link

5.6.2.6.1. Cluster Size
Copy link

5.6.2.6.2. Network Partition Events
Copy link

5.6.3. Next steps
Copy link

5.7. Embedded Infinispan metrics for single site deployments
Copy link

5.7.1. Prerequisites
Copy link

5.7.2. Metrics
Copy link

5.7.2.1. Size
Copy link

5.7.2.2. Data Access
Copy link

5.7.2.2.1. Stores
Copy link

5.7.2.2.2. Reads
Copy link

5.7.2.2.3. Removes
Copy link

5.7.2.2.4. Eviction
Copy link

5.7.2.3. Locking
Copy link

5.7.2.4. Transactions
Copy link

5.7.2.5. State Transfer
Copy link

5.7.2.6. Cluster Data Replication
Copy link

5.7.3. Next steps
Copy link

5.8. Embedded Infinispan metrics for multi-site deployments
Copy link

5.8.1. Prerequisites
Copy link

5.8.2. Metrics
Copy link

5.8.2.1. Size
Copy link

5.8.2.2. Data Access
Copy link

5.8.2.2.1. Stores
Copy link

5.8.2.2.2. Reads
Copy link

5.8.2.2.3. Removes
Copy link

5.8.2.2.4. Eviction
Copy link

5.8.2.3. Transactions
Copy link

5.8.3. Next steps
Copy link

5.9. External Data Grid metrics
Copy link

5.9.1. Prerequisites
Copy link

5.9.1.1. Enabled Data Grid server metrics
Copy link

5.9.2. Clustering and Network
Copy link

5.9.2.1. Response Time
Copy link

5.9.2.2. Bandwidth
Copy link

5.9.2.3. Thread Pool
Copy link

5.9.2.4. Flow Control
Copy link

5.9.2.5. Retransmissions
Copy link

5.9.2.6. Network Partitions
Copy link

5.9.2.6.1. Cluster Size
Copy link

5.9.2.6.2. Cross-Site Status
Copy link

5.9.2.6.3. Network Partition Events
Copy link

5.9.3. Data Grid Caches
Copy link

5.9.3.1. Size
Copy link

5.9.3.2. Data Access
Copy link