Chapter 17. Troubleshooting Data Grid Server deployments

17.1. Getting diagnostic reports from Data Grid Server
复制链接

Data Grid Server provides aggregated reports in tar.gz archives that contain diagnostic information about server instances and host systems. The report provides details about CPU, memory, open files, network sockets and routing, threads, in addition to configuration and log files.

Procedure

Create a CLI connection to Data Grid Server.

Use the server report command to download a tar.gz archive:

server report
Downloaded report 'infinispan-<hostname>-<timestamp>-report.tar.gz'

The command responds with the name of the report, as in the following example:

Downloaded report 'infinispan-<hostname>-<timestamp>-report.tar.gz'

Move the tar.gz file to a suitable location on your filesystem.
Extract the tar.gz file with any archiving tool.

17.1.1. Understanding the report
复制链接

The report generated with the CLI contains different categories of files containing information about the environment and the application. We split the files into subcategories and briefly describe how to read each one.

17.1.1.1. Environment
复制链接

These files contain metrics of the environment the Data Grid server is running on. In addition to understanding how the application behaves, external factors might cause issues, such as misconfiguration in the network, many processes running in the same host, etc. The following files shed light on these factors for investigation:

Host:

cpuinfo: Provides information about the host’s CPU.
df: Provides information about the host’s filesystem.
lsof: Describes the open files in the host.
meminfo: Provides information about the host’s memory.
os-release: Describes the current OS the server is running on.
uname: Additional system information.

Network:

ip-address: Describes the network interfaces in the host and the address and capabilities of each one.
ip-maddress: Describes the multicast address on each network interface in the host.
ip-mroute: Describes the multicast routing table in the host.
ip-route: Describes the the routing table in the host.
ss-tcp: List the TCP sockets in the host.
ss-udp: List the UDP sockets in the host.

17.1.1.2. Data Grid server
复制链接

Additionally, the report collects information from the running server. The report generates a thread dump utilizing the jstack command. The server logs and configurations are also included in the files.

Note

The report tries to collect all this information. However, some files might be missing if the host cannot run a specific command.

17.2. Changing Data Grid Server logging configuration at runtime
复制链接

Modify the logging configuration for Data Grid Server at runtime to temporarily adjust logging to troubleshoot issues and perform root cause analysis.

Modifying the logging configuration through the CLI is a runtime-only operation, which means that changes:

Are not saved to the log4j2.xml file. Restarting server nodes or the entire cluster resets the logging configuration to the default properties in the log4j2.xml file.
Apply only to the nodes in the cluster when you invoke the CLI. Nodes that join the cluster after you change the logging configuration use the default properties.

Procedure

Create a CLI connection to Data Grid Server.

Use the logging to make the required adjustments.

List all appenders defined on the server:

logging list-appenders

The command provides a JSON response such as the following:

{
  "STDOUT" : {
    "name" : "STDOUT"
  },
  "JSON-FILE" : {
    "name" : "JSON-FILE"
  },
  "HR-ACCESS-FILE" : {
    "name" : "HR-ACCESS-FILE"
  },
  "FILE" : {
    "name" : "FILE"
  },
  "REST-ACCESS-FILE" : {
    "name" : "REST-ACCESS-FILE"
  }
}

List all logger configurations defined on the server:

logging list-loggers

The command provides a JSON response such as the following:

[ {
  "name" : "",
  "level" : "INFO",
  "appenders" : [ "STDOUT", "FILE" ]
}, {
  "name" : "org.infinispan.HOTROD_ACCESS_LOG",
  "level" : "INFO",
  "appenders" : [ "HR-ACCESS-FILE" ]
}, {
  "name" : "com.arjuna",
  "level" : "WARN",
  "appenders" : [ ]
}, {
  "name" : "org.infinispan.REST_ACCESS_LOG",
  "level" : "INFO",
  "appenders" : [ "REST-ACCESS-FILE" ]
} ]

Add and modify logger configurations with the set subcommand
For example, the following command sets the logging level for the org.infinispan package to DEBUG:
```
logging set --level=DEBUG org.infinispan
```
Remove existing logger configurations with the remove subcommand.
For example, the following command removes the org.infinispan logger configuration, which means the root configuration is used instead:
```
logging remove org.infinispan
```

17.2.1. Access logging
复制链接

The command also allows for enabling access logs during runtime to analyze issues. You can enable the access logging for a specific endpoint and utilize the CLI command to parse those files to generate server-side metrics. This technique is useful in identifying whether latency is an issue from the connection between client and server or intra-cluster.

The workflow for this analysis involves a few steps:

Enable access logging with the CLI for the endpoint you are analyzing.
```
logging set --level=TRACE org.infinispan.RESP_ACCESS_LOG
```
In this example, enabling the access log for the RESP endpoint.
Run the tests to reproduce the issue or workload to measure.

Disable access logging with the CLI.

logging set --level=INFO org.infinispan.RESP_ACCESS_LOG

Download the access log files.
The server report command is also capable of downloading the log files.

With the access log files available, you can utilize the CLI command to analyze the server’s behavior more deeply. The CLI command provides information about long-running commands, segmentation per client, and operations.

17.3. Gathering resource statistics from the CLI
复制链接

You can inspect server-collected statistics for some Data Grid Server resources with the stats command.

Use the stats command either from the context of a resource that provides statistics (containers, caches) or with a path to such a resource:

stats

{
  "statistics_enabled" : true,
  "number_of_entries" : 0,
  "hit_ratio" : 0.0,
  "read_write_ratio" : 0.0,
  "time_since_start" : 0,
  "time_since_reset" : 49,
  "current_number_of_entries" : 0,
  "current_number_of_entries_in_memory" : 0,
  "off_heap_memory_used" : 0,
  "data_memory_used" : 0,
  "stores" : 0,
  "retrievals" : 0,
  "hits" : 0,
  "misses" : 0,
  "remove_hits" : 0,
  "remove_misses" : 0,
  "evictions" : 0,
  "average_read_time" : 0,
  "average_read_time_nanos" : 0,
  "average_write_time" : 0,
  "average_write_time_nanos" : 0,
  "average_remove_time" : 0,
  "average_remove_time_nanos" : 0,
  "required_minimum_number_of_nodes" : -1
}

stats /containers/default/caches/mycache

{
  "time_since_start" : -1,
  "time_since_reset" : -1,
  "current_number_of_entries" : -1,
  "current_number_of_entries_in_memory" : -1,
  "off_heap_memory_used" : -1,
  "data_memory_used" : -1,
  "stores" : -1,
  "retrievals" : -1,
  "hits" : -1,
  "misses" : -1,
  "remove_hits" : -1,
  "remove_misses" : -1,
  "evictions" : -1,
  "average_read_time" : -1,
  "average_read_time_nanos" : -1,
  "average_write_time" : -1,
  "average_write_time_nanos" : -1,
  "average_remove_time" : -1,
  "average_remove_time_nanos" : -1,
  "required_minimum_number_of_nodes" : -1
}

17.4. JVM settings for Data Grid
复制链接

You can define Java Virtual Machine (JVM) settings for Data Grid either by editing the server.conf configuration file, or by setting the JAVA_OPTS and JAVA_OPTIONS environment variables. Set the JAVA_OPTIONS variable if you just want to append JVM settings to those that are automatically set by the server.conf file. Set the JAVA_OPTS variable if you want to completely override the JVM settings.

Important

If you are running Data Grid in a container do not set Xmx or Xms because the values are automatically calculated from the container settings to be 70% of the container size.

Editing the configuration file

You can edit the required values in the server.conf configuration file. For example, to set the options to pass to the JVM, edit the following lines:

if [ "$JAVA_OPTS" = "" ]; then
   JAVA_OPTS="..."
else
   echo "JAVA_OPTS already set in environment; overriding default settings with values: $JAVA_OPTS"
fi

You can uncomment the existing example settings as well. For example, to configure Java Platform Debugger Architecture (JPDA) settings for remote socket debugging, update the file as follows:

# Sample JPDA settings for remote socket debugging
JAVA_OPTS="$JAVA_OPTS -agentlib:jdwp=transport=dt_socket,address=8787,server=y,suspend=n"

Additionally, you can add more settings to JAVA_OPTS like this:

JAVA_OPTS="$JAVA_OPTS <key_1>=<value_1>, ..., <key_N>=<value_N> "

Setting an environment variable

You can override the settings in server.conf configuration file by setting the JAVA_OPTS environment variable. For example:

Linux

export JAVA_OPTS="-Xmx1024M"

Windows

set JAVA_OPTS="-Xmx1024M"

17.4.1. Garbage Collection Logging
复制链接

Data Grid automatically enables garbage collection logs to the $ISPN_HOME/server/log directory. Disable the logs by setting the GC_LOG environment variable to false before launching the server.

Linux

export GC_LOG="false"

Windows

set "GC_LOG=false"

17.4.2. Heap Dump on OutOfMemoryError
复制链接

Data Grid enables automatic heap dumps on OutOfMemoryError to the $ISPN_HOME/server/log directory. Heap dumps are useful for troubleshooting memory issues. You can use tools such as jhat (included in the JDK), VisualVM or Eclipse Memory Analyzer to inspect heap dumps. Disable heap dumps by setting the HEAP_DUMP environment variable to false before launching the server.

Linux

export HEAP_DUMP="false"

Windows

set "HEAP_DUMP=false"

Warning

Heap dumps are snapshots of the in-memory data you have stored in Data Grid caches and may therefore contain sensitive data. Consider sanitizing the heap dumps before sharing them with third parties by using a tool such as the Heap Dump Tool.

17.5. Virtual Threads Support
复制链接

Infinispan supports virtual threads, which can significantly improve application responsiveness and scalability under high concurrency. By default, they are enabled if you are running on JDK 21 or higher.

Warning

On systems with JDK versions prior to 24 and low CPU counts (2 or less), Data Grid might experience thread pinning issues when using virtual threads. Thread pinning is a situation where virtual threads are unexpectedly bound to a limited number of OS threads, potentially leading to performance degradation or system freezes.

To work around this problem, virtual threads can be disabled as described in the procedure below, or the virtual thread scheduler parallelism may be increased with the Java option -Djdk.virtualThreadScheduler.parallelism=<value>. A common starting point for <value> on low-CPU systems is 4.

17.5.1. Disable Virtual Threads
复制链接

Disables the virtual threads pool.

Set the -Dorg.infinispan.threads.virtual=false option in the JAVA_OPTS environment variable before starting the Data Grid server.
Alternatively, you can append this option directly to the command used to start the Data Grid server:
```
bin/server.sh -Dorg.infinispan.threads.virtual=false
```
Verify by checking the server logs. The following log entry should not be present:
```
Virtual threads support enabled
```

17.6. Accessing cluster health via REST
复制链接

Get Data Grid cluster health via the REST API.

Procedure

Invoke a GET request to retrieve cluster health.
```
GET /rest/v2/container/health
```

Data Grid responds with a JSON document such as the following:

{
    "cluster_health":{
        "cluster_name":"ISPN",
        "health_status":"HEALTHY",
        "number_of_nodes":2,
        "node_names":[
            "NodeA-36229",
            "NodeB-28703"
        ]
    },
    "cache_health":[
        {
            "status":"HEALTHY",
            "cache_name":"___protobuf_metadata"
        },
        {
            "status":"HEALTHY",
            "cache_name":"cache2"
        },
        {
            "status":"HEALTHY",
            "cache_name":"mycache"
        },
        {
            "status":"HEALTHY",
            "cache_name":"cache1"
        }
    ]

}

Tip

Get Cache Manager status as follows:

GET /rest/v2/container/health/status

Reference

See the REST v2 (version 2) API documentation for more information.

17.7. Accessing cluster health via JMX
复制链接

Retrieve Data Grid cluster health statistics via JMX.

Procedure

Connect to Data Grid server using any JMX capable tool such as JConsole and navigate to the following object:
```
org.infinispan:type=CacheManager,name="default",component=CacheContainerHealth
```
Select available MBeans to retrieve cluster health statistics.

17.8. Benchmark the Data Grid Server
复制链接

Data Grid provides a built-in benchmark utility accessible through the CLI to validate server configuration changes. This benchmark will run operations for writes and reads, submitting requests as fast as possible. Since this might be a contrived scenario that does not represent a real production environment, the numbers should be read carefully. Nevertheless, this approach is simple enough to verify for configuration changes.

Check the CLI command documentation for more information.

此内容没有您所选择的语言版本。

17.1. Getting diagnostic reports from Data Grid Server
复制链接

17.1.1. Understanding the report
复制链接

17.1.1.1. Environment
复制链接

17.1.1.2. Data Grid server
复制链接

17.2. Changing Data Grid Server logging configuration at runtime
复制链接

17.2.1. Access logging
复制链接

Categories

17.3. Gathering resource statistics from the CLI
复制链接

17.4. JVM settings for Data Grid
复制链接

Editing the configuration file

Setting an environment variable

17.4.1. Garbage Collection Logging
复制链接

17.4.2. Heap Dump on OutOfMemoryError
复制链接

17.5. Virtual Threads Support
复制链接

17.5.1. Disable Virtual Threads
复制链接

17.6. Accessing cluster health via REST
复制链接

17.7. Accessing cluster health via JMX
复制链接

17.8. Benchmark the Data Grid Server
复制链接

学习

尝试、购买和销售

社区

關於紅帽

让开源更具包容性

关于红帽文档

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

此内容没有您所选择的语言版本。

Chapter 17. Troubleshooting Data Grid Server deployments

17.1. Getting diagnostic reports from Data Grid Server复制链接链接已复制到粘贴板!

17.1.1. Understanding the report复制链接链接已复制到粘贴板!

17.1.1.1. Environment复制链接链接已复制到粘贴板!

17.1.1.2. Data Grid server复制链接链接已复制到粘贴板!

17.2. Changing Data Grid Server logging configuration at runtime复制链接链接已复制到粘贴板!

17.2.1. Access logging复制链接链接已复制到粘贴板!

Categories

17.3. Gathering resource statistics from the CLI复制链接链接已复制到粘贴板!

17.4. JVM settings for Data Grid复制链接链接已复制到粘贴板!

Editing the configuration file

Setting an environment variable

17.4.1. Garbage Collection Logging复制链接链接已复制到粘贴板!

17.4.2. Heap Dump on OutOfMemoryError复制链接链接已复制到粘贴板!

17.5. Virtual Threads Support复制链接链接已复制到粘贴板!

17.5.1. Disable Virtual Threads复制链接链接已复制到粘贴板!

17.6. Accessing cluster health via REST复制链接链接已复制到粘贴板!

17.7. Accessing cluster health via JMX复制链接链接已复制到粘贴板!

17.8. Benchmark the Data Grid Server复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

關於紅帽

让开源更具包容性

关于红帽文档

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

17.1. Getting diagnostic reports from Data Grid Server
复制链接

17.1.1. Understanding the report
复制链接

17.1.1.1. Environment
复制链接

17.1.1.2. Data Grid server
复制链接

17.2. Changing Data Grid Server logging configuration at runtime
复制链接

17.2.1. Access logging
复制链接

17.3. Gathering resource statistics from the CLI
复制链接

17.4. JVM settings for Data Grid
复制链接

17.4.1. Garbage Collection Logging
复制链接

17.4.2. Heap Dump on OutOfMemoryError
复制链接

17.5. Virtual Threads Support
复制链接

17.5.1. Disable Virtual Threads
复制链接

17.6. Accessing cluster health via REST
复制链接

17.7. Accessing cluster health via JMX
复制链接

17.8. Benchmark the Data Grid Server
复制链接