此内容没有您所选择的语言版本。

Chapter 17. Troubleshooting Data Grid Server deployments


Gather diagnostic information about Data Grid Server deployments and perform troubleshooting steps to resolve issues.

17.1. Getting diagnostic reports from Data Grid Server

Data Grid Server provides aggregated reports in tar.gz archives that contain diagnostic information about server instances and host systems. The report provides details about CPU, memory, open files, network sockets and routing, threads, in addition to configuration and log files.

Procedure

  1. Create a CLI connection to Data Grid Server.
  2. Use the server report command to download a tar.gz archive:

    server report
    Downloaded report 'infinispan-<hostname>-<timestamp>-report.tar.gz'

    The command responds with the name of the report, as in the following example:

    Downloaded report 'infinispan-<hostname>-<timestamp>-report.tar.gz'
  3. Move the tar.gz file to a suitable location on your filesystem.
  4. Extract the tar.gz file with any archiving tool.

17.1.1. Understanding the report

The report generated with the CLI contains different categories of files containing information about the environment and the application. We split the files into subcategories and briefly describe how to read each one.

17.1.1.1. Environment

These files contain metrics of the environment the Data Grid server is running on. In addition to understanding how the application behaves, external factors might cause issues, such as misconfiguration in the network, many processes running in the same host, etc. The following files shed light on these factors for investigation:

Host:

  • cpuinfo: Provides information about the host’s CPU.
  • df: Provides information about the host’s filesystem.
  • lsof: Describes the open files in the host.
  • meminfo: Provides information about the host’s memory.
  • os-release: Describes the current OS the server is running on.
  • uname: Additional system information.

Network:

  • ip-address: Describes the network interfaces in the host and the address and capabilities of each one.
  • ip-maddress: Describes the multicast address on each network interface in the host.
  • ip-mroute: Describes the multicast routing table in the host.
  • ip-route: Describes the the routing table in the host.
  • ss-tcp: List the TCP sockets in the host.
  • ss-udp: List the UDP sockets in the host.

17.1.1.2. Data Grid server

Additionally, the report collects information from the running server. The report generates a thread dump utilizing the jstack command. The server logs and configurations are also included in the files.

Note

The report tries to collect all this information. However, some files might be missing if the host cannot run a specific command.

Modify the logging configuration for Data Grid Server at runtime to temporarily adjust logging to troubleshoot issues and perform root cause analysis.

Modifying the logging configuration through the CLI is a runtime-only operation, which means that changes:

  • Are not saved to the log4j2.xml file. Restarting server nodes or the entire cluster resets the logging configuration to the default properties in the log4j2.xml file.
  • Apply only to the nodes in the cluster when you invoke the CLI. Nodes that join the cluster after you change the logging configuration use the default properties.

Procedure

  1. Create a CLI connection to Data Grid Server.
  2. Use the logging to make the required adjustments.

    • List all appenders defined on the server:

      logging list-appenders

      The command provides a JSON response such as the following:

      {
        "STDOUT" : {
          "name" : "STDOUT"
        },
        "JSON-FILE" : {
          "name" : "JSON-FILE"
        },
        "HR-ACCESS-FILE" : {
          "name" : "HR-ACCESS-FILE"
        },
        "FILE" : {
          "name" : "FILE"
        },
        "REST-ACCESS-FILE" : {
          "name" : "REST-ACCESS-FILE"
        }
      }
    • List all logger configurations defined on the server:

      logging list-loggers

      The command provides a JSON response such as the following:

      [ {
        "name" : "",
        "level" : "INFO",
        "appenders" : [ "STDOUT", "FILE" ]
      }, {
        "name" : "org.infinispan.HOTROD_ACCESS_LOG",
        "level" : "INFO",
        "appenders" : [ "HR-ACCESS-FILE" ]
      }, {
        "name" : "com.arjuna",
        "level" : "WARN",
        "appenders" : [ ]
      }, {
        "name" : "org.infinispan.REST_ACCESS_LOG",
        "level" : "INFO",
        "appenders" : [ "REST-ACCESS-FILE" ]
      } ]
    • Add and modify logger configurations with the set subcommand

      For example, the following command sets the logging level for the org.infinispan package to DEBUG:

      logging set --level=DEBUG org.infinispan
    • Remove existing logger configurations with the remove subcommand.

      For example, the following command removes the org.infinispan logger configuration, which means the root configuration is used instead:

      logging remove org.infinispan

17.2.1. Access logging

The command also allows for enabling access logs during runtime to analyze issues. You can enable the access logging for a specific endpoint and utilize the CLI command to parse those files to generate server-side metrics. This technique is useful in identifying whether latency is an issue from the connection between client and server or intra-cluster.

The workflow for this analysis involves a few steps:

  1. Enable access logging with the CLI for the endpoint you are analyzing.

    logging set --level=TRACE org.infinispan.RESP_ACCESS_LOG

    In this example, enabling the access log for the RESP endpoint.

  2. Run the tests to reproduce the issue or workload to measure.
  3. Disable access logging with the CLI.

    logging set --level=INFO org.infinispan.RESP_ACCESS_LOG
  4. Download the access log files.

    The server report command is also capable of downloading the log files.

With the access log files available, you can utilize the CLI command to analyze the server’s behavior more deeply. The CLI command provides information about long-running commands, segmentation per client, and operations.

Categories

Data Grid provides an access log for each endpoint available on the server. You can edit the logging level to enable each endpoint individually. The loggers available for each endpoint are:

  • org.infinispan.HOTROD_ACCESS_LOG: Enables the access log for the Hot Rod endpoint.
  • org.infinispan.REST_ACCESS_LOG: Enables the access log for the REST endpoint.
  • org.infinispan.MEMCACHED_ACCESS_LOG: Enables the access log for the Memcached endpoint.
  • org.infinispan.RESP_ACCESS_LOG: Enables the access log for the RESP endpoint.

Observe that each logger can be activated individually and independently of the others.

17.3. Gathering resource statistics from the CLI

You can inspect server-collected statistics for some Data Grid Server resources with the stats command.

Use the stats command either from the context of a resource that provides statistics (containers, caches) or with a path to such a resource:

stats
{
  "statistics_enabled" : true,
  "number_of_entries" : 0,
  "hit_ratio" : 0.0,
  "read_write_ratio" : 0.0,
  "time_since_start" : 0,
  "time_since_reset" : 49,
  "current_number_of_entries" : 0,
  "current_number_of_entries_in_memory" : 0,
  "off_heap_memory_used" : 0,
  "data_memory_used" : 0,
  "stores" : 0,
  "retrievals" : 0,
  "hits" : 0,
  "misses" : 0,
  "remove_hits" : 0,
  "remove_misses" : 0,
  "evictions" : 0,
  "average_read_time" : 0,
  "average_read_time_nanos" : 0,
  "average_write_time" : 0,
  "average_write_time_nanos" : 0,
  "average_remove_time" : 0,
  "average_remove_time_nanos" : 0,
  "required_minimum_number_of_nodes" : -1
}
stats /containers/default/caches/mycache
{
  "time_since_start" : -1,
  "time_since_reset" : -1,
  "current_number_of_entries" : -1,
  "current_number_of_entries_in_memory" : -1,
  "off_heap_memory_used" : -1,
  "data_memory_used" : -1,
  "stores" : -1,
  "retrievals" : -1,
  "hits" : -1,
  "misses" : -1,
  "remove_hits" : -1,
  "remove_misses" : -1,
  "evictions" : -1,
  "average_read_time" : -1,
  "average_read_time_nanos" : -1,
  "average_write_time" : -1,
  "average_write_time_nanos" : -1,
  "average_remove_time" : -1,
  "average_remove_time_nanos" : -1,
  "required_minimum_number_of_nodes" : -1
}

17.4. JVM settings for Data Grid

You can define Java Virtual Machine (JVM) settings for Data Grid either by editing the server.conf configuration file, or by setting the JAVA_OPTS and JAVA_OPTIONS environment variables. Set the JAVA_OPTIONS variable if you just want to append JVM settings to those that are automatically set by the server.conf file. Set the JAVA_OPTS variable if you want to completely override the JVM settings.

Important

If you are running Data Grid in a container do not set Xmx or Xms because the values are automatically calculated from the container settings to be 70% of the container size.

Editing the configuration file

You can edit the required values in the server.conf configuration file. For example, to set the options to pass to the JVM, edit the following lines:

if [ "$JAVA_OPTS" = "" ]; then
   JAVA_OPTS="..."
else
   echo "JAVA_OPTS already set in environment; overriding default settings with values: $JAVA_OPTS"
fi

You can uncomment the existing example settings as well. For example, to configure Java Platform Debugger Architecture (JPDA) settings for remote socket debugging, update the file as follows:

# Sample JPDA settings for remote socket debugging
JAVA_OPTS="$JAVA_OPTS -agentlib:jdwp=transport=dt_socket,address=8787,server=y,suspend=n"

Additionally, you can add more settings to JAVA_OPTS like this:

JAVA_OPTS="$JAVA_OPTS <key_1>=<value_1>, ..., <key_N>=<value_N> "

Setting an environment variable

You can override the settings in server.conf configuration file by setting the JAVA_OPTS environment variable. For example:

Linux

export JAVA_OPTS="-Xmx1024M"

Windows

set JAVA_OPTS="-Xmx1024M"

17.4.1. Garbage Collection Logging

Data Grid automatically enables garbage collection logs to the $ISPN_HOME/server/log directory. Disable the logs by setting the GC_LOG environment variable to false before launching the server.

Linux

export GC_LOG="false"

Windows

set "GC_LOG=false"

17.4.2. Heap Dump on OutOfMemoryError

Data Grid enables automatic heap dumps on OutOfMemoryError to the $ISPN_HOME/server/log directory. Heap dumps are useful for troubleshooting memory issues. You can use tools such as jhat (included in the JDK), VisualVM or Eclipse Memory Analyzer to inspect heap dumps. Disable heap dumps by setting the HEAP_DUMP environment variable to false before launching the server.

Linux

export HEAP_DUMP="false"

Windows

set "HEAP_DUMP=false"

Warning

Heap dumps are snapshots of the in-memory data you have stored in Data Grid caches and may therefore contain sensitive data. Consider sanitizing the heap dumps before sharing them with third parties by using a tool such as the Heap Dump Tool.

17.5. Virtual Threads Support

Infinispan supports virtual threads, which can significantly improve application responsiveness and scalability under high concurrency. By default, they are enabled if you are running on JDK 21 or higher.

Warning

On systems with JDK versions prior to 24 and low CPU counts (2 or less), Data Grid might experience thread pinning issues when using virtual threads. Thread pinning is a situation where virtual threads are unexpectedly bound to a limited number of OS threads, potentially leading to performance degradation or system freezes.

To work around this problem, virtual threads can be disabled as described in the procedure below, or the virtual thread scheduler parallelism may be increased with the Java option -Djdk.virtualThreadScheduler.parallelism=<value>. A common starting point for <value> on low-CPU systems is 4.

17.5.1. Disable Virtual Threads

Disables the virtual threads pool.

  1. Set the -Dorg.infinispan.threads.virtual=false option in the JAVA_OPTS environment variable before starting the Data Grid server.

    Alternatively, you can append this option directly to the command used to start the Data Grid server:

    bin/server.sh -Dorg.infinispan.threads.virtual=false
  2. Verify by checking the server logs. The following log entry should not be present:

    Virtual threads support enabled

17.6. Accessing cluster health via REST

Get Data Grid cluster health via the REST API.

Procedure

  • Invoke a GET request to retrieve cluster health.

    GET /rest/v2/container/health

Data Grid responds with a JSON document such as the following:

{
    "cluster_health":{
        "cluster_name":"ISPN",
        "health_status":"HEALTHY",
        "number_of_nodes":2,
        "node_names":[
            "NodeA-36229",
            "NodeB-28703"
        ]
    },
    "cache_health":[
        {
            "status":"HEALTHY",
            "cache_name":"___protobuf_metadata"
        },
        {
            "status":"HEALTHY",
            "cache_name":"cache2"
        },
        {
            "status":"HEALTHY",
            "cache_name":"mycache"
        },
        {
            "status":"HEALTHY",
            "cache_name":"cache1"
        }
    ]

}
Tip

Get Cache Manager status as follows:

GET /rest/v2/container/health/status

Reference

See the REST v2 (version 2) API documentation for more information.

17.7. Accessing cluster health via JMX

Retrieve Data Grid cluster health statistics via JMX.

Procedure

  1. Connect to Data Grid server using any JMX capable tool such as JConsole and navigate to the following object:

    org.infinispan:type=CacheManager,name="default",component=CacheContainerHealth
  2. Select available MBeans to retrieve cluster health statistics.

17.8. Benchmark the Data Grid Server

Data Grid provides a built-in benchmark utility accessible through the CLI to validate server configuration changes. This benchmark will run operations for writes and reads, submitting requests as fast as possible. Since this might be a contrived scenario that does not represent a real production environment, the numbers should be read carefully. Nevertheless, this approach is simple enough to verify for configuration changes.

Check the CLI command documentation for more information.

Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2026 Red Hat
返回顶部