此内容没有您所选择的语言版本。
Chapter 17. Troubleshooting Data Grid Server deployments
Gather diagnostic information about Data Grid Server deployments and perform troubleshooting steps to resolve issues.
17.1. Getting diagnostic reports from Data Grid Server 复制链接链接已复制到粘贴板!
Data Grid Server provides aggregated reports in tar.gz archives that contain diagnostic information about server instances and host systems. The report provides details about CPU, memory, open files, network sockets and routing, threads, in addition to configuration and log files.
Procedure
- Create a CLI connection to Data Grid Server.
Use the
server reportcommand to download atar.gzarchive:server report Downloaded report 'infinispan-<hostname>-<timestamp>-report.tar.gz'The command responds with the name of the report, as in the following example:
Downloaded report 'infinispan-<hostname>-<timestamp>-report.tar.gz'-
Move the
tar.gzfile to a suitable location on your filesystem. -
Extract the
tar.gzfile with any archiving tool.
17.1.1. Understanding the report 复制链接链接已复制到粘贴板!
The report generated with the CLI contains different categories of files containing information about the environment and the application. We split the files into subcategories and briefly describe how to read each one.
17.1.1.1. Environment 复制链接链接已复制到粘贴板!
These files contain metrics of the environment the Data Grid server is running on. In addition to understanding how the application behaves, external factors might cause issues, such as misconfiguration in the network, many processes running in the same host, etc. The following files shed light on these factors for investigation:
Host:
-
cpuinfo: Provides information about the host’s CPU. -
df: Provides information about the host’s filesystem. -
lsof: Describes the open files in the host. -
meminfo: Provides information about the host’s memory. -
os-release: Describes the current OS the server is running on. -
uname: Additional system information.
Network:
-
ip-address: Describes the network interfaces in the host and the address and capabilities of each one. -
ip-maddress: Describes the multicast address on each network interface in the host. -
ip-mroute: Describes the multicast routing table in the host. -
ip-route: Describes the the routing table in the host. -
ss-tcp: List the TCP sockets in the host. -
ss-udp: List the UDP sockets in the host.
17.1.1.2. Data Grid server 复制链接链接已复制到粘贴板!
Additionally, the report collects information from the running server. The report generates a thread dump utilizing the jstack command. The server logs and configurations are also included in the files.
The report tries to collect all this information. However, some files might be missing if the host cannot run a specific command.
Modify the logging configuration for Data Grid Server at runtime to temporarily adjust logging to troubleshoot issues and perform root cause analysis.
Modifying the logging configuration through the CLI is a runtime-only operation, which means that changes:
-
Are not saved to the
log4j2.xmlfile. Restarting server nodes or the entire cluster resets the logging configuration to the default properties in thelog4j2.xmlfile. - Apply only to the nodes in the cluster when you invoke the CLI. Nodes that join the cluster after you change the logging configuration use the default properties.
Procedure
- Create a CLI connection to Data Grid Server.
Use the
loggingto make the required adjustments.List all appenders defined on the server:
logging list-appendersThe command provides a JSON response such as the following:
{ "STDOUT" : { "name" : "STDOUT" }, "JSON-FILE" : { "name" : "JSON-FILE" }, "HR-ACCESS-FILE" : { "name" : "HR-ACCESS-FILE" }, "FILE" : { "name" : "FILE" }, "REST-ACCESS-FILE" : { "name" : "REST-ACCESS-FILE" } }List all logger configurations defined on the server:
logging list-loggersThe command provides a JSON response such as the following:
[ { "name" : "", "level" : "INFO", "appenders" : [ "STDOUT", "FILE" ] }, { "name" : "org.infinispan.HOTROD_ACCESS_LOG", "level" : "INFO", "appenders" : [ "HR-ACCESS-FILE" ] }, { "name" : "com.arjuna", "level" : "WARN", "appenders" : [ ] }, { "name" : "org.infinispan.REST_ACCESS_LOG", "level" : "INFO", "appenders" : [ "REST-ACCESS-FILE" ] } ]Add and modify logger configurations with the
setsubcommandFor example, the following command sets the logging level for the
org.infinispanpackage toDEBUG:logging set --level=DEBUG org.infinispanRemove existing logger configurations with the
removesubcommand.For example, the following command removes the
org.infinispanlogger configuration, which means the root configuration is used instead:logging remove org.infinispan
17.2.1. Access logging 复制链接链接已复制到粘贴板!
The command also allows for enabling access logs during runtime to analyze issues. You can enable the access logging for a specific endpoint and utilize the CLI command to parse those files to generate server-side metrics. This technique is useful in identifying whether latency is an issue from the connection between client and server or intra-cluster.
The workflow for this analysis involves a few steps:
Enable access logging with the CLI for the endpoint you are analyzing.
logging set --level=TRACE org.infinispan.RESP_ACCESS_LOGIn this example, enabling the access log for the RESP endpoint.
- Run the tests to reproduce the issue or workload to measure.
Disable access logging with the CLI.
logging set --level=INFO org.infinispan.RESP_ACCESS_LOGDownload the access log files.
The
server reportcommand is also capable of downloading the log files.
With the access log files available, you can utilize the CLI command to analyze the server’s behavior more deeply. The CLI command provides information about long-running commands, segmentation per client, and operations.
Categories
Data Grid provides an access log for each endpoint available on the server. You can edit the logging level to enable each endpoint individually. The loggers available for each endpoint are:
-
org.infinispan.HOTROD_ACCESS_LOG: Enables the access log for the Hot Rod endpoint. -
org.infinispan.REST_ACCESS_LOG: Enables the access log for the REST endpoint. -
org.infinispan.MEMCACHED_ACCESS_LOG: Enables the access log for the Memcached endpoint. -
org.infinispan.RESP_ACCESS_LOG: Enables the access log for the RESP endpoint.
Observe that each logger can be activated individually and independently of the others.
17.3. Gathering resource statistics from the CLI 复制链接链接已复制到粘贴板!
You can inspect server-collected statistics for some Data Grid Server resources with the stats command.
Use the stats command either from the context of a resource that provides statistics (containers, caches) or with a path to such a resource:
stats
{
"statistics_enabled" : true,
"number_of_entries" : 0,
"hit_ratio" : 0.0,
"read_write_ratio" : 0.0,
"time_since_start" : 0,
"time_since_reset" : 49,
"current_number_of_entries" : 0,
"current_number_of_entries_in_memory" : 0,
"off_heap_memory_used" : 0,
"data_memory_used" : 0,
"stores" : 0,
"retrievals" : 0,
"hits" : 0,
"misses" : 0,
"remove_hits" : 0,
"remove_misses" : 0,
"evictions" : 0,
"average_read_time" : 0,
"average_read_time_nanos" : 0,
"average_write_time" : 0,
"average_write_time_nanos" : 0,
"average_remove_time" : 0,
"average_remove_time_nanos" : 0,
"required_minimum_number_of_nodes" : -1
}
stats /containers/default/caches/mycache
{
"time_since_start" : -1,
"time_since_reset" : -1,
"current_number_of_entries" : -1,
"current_number_of_entries_in_memory" : -1,
"off_heap_memory_used" : -1,
"data_memory_used" : -1,
"stores" : -1,
"retrievals" : -1,
"hits" : -1,
"misses" : -1,
"remove_hits" : -1,
"remove_misses" : -1,
"evictions" : -1,
"average_read_time" : -1,
"average_read_time_nanos" : -1,
"average_write_time" : -1,
"average_write_time_nanos" : -1,
"average_remove_time" : -1,
"average_remove_time_nanos" : -1,
"required_minimum_number_of_nodes" : -1
}
17.4. JVM settings for Data Grid 复制链接链接已复制到粘贴板!
You can define Java Virtual Machine (JVM) settings for Data Grid either by editing the server.conf configuration file, or by setting the JAVA_OPTS and JAVA_OPTIONS environment variables. Set the JAVA_OPTIONS variable if you just want to append JVM settings to those that are automatically set by the server.conf file. Set the JAVA_OPTS variable if you want to completely override the JVM settings.
If you are running Data Grid in a container do not set Xmx or Xms because the values are automatically calculated from the container settings to be 70% of the container size.
Editing the configuration file
You can edit the required values in the server.conf configuration file. For example, to set the options to pass to the JVM, edit the following lines:
if [ "$JAVA_OPTS" = "" ]; then
JAVA_OPTS="..."
else
echo "JAVA_OPTS already set in environment; overriding default settings with values: $JAVA_OPTS"
fi
You can uncomment the existing example settings as well. For example, to configure Java Platform Debugger Architecture (JPDA) settings for remote socket debugging, update the file as follows:
# Sample JPDA settings for remote socket debugging
JAVA_OPTS="$JAVA_OPTS -agentlib:jdwp=transport=dt_socket,address=8787,server=y,suspend=n"
Additionally, you can add more settings to JAVA_OPTS like this:
JAVA_OPTS="$JAVA_OPTS <key_1>=<value_1>, ..., <key_N>=<value_N> "
Setting an environment variable
You can override the settings in server.conf configuration file by setting the JAVA_OPTS environment variable. For example:
Linux
export JAVA_OPTS="-Xmx1024M"
Windows
set JAVA_OPTS="-Xmx1024M"
17.4.1. Garbage Collection Logging 复制链接链接已复制到粘贴板!
Data Grid automatically enables garbage collection logs to the $ISPN_HOME/server/log directory. Disable the logs by setting the GC_LOG environment variable to false before launching the server.
Linux
export GC_LOG="false"
Windows
set "GC_LOG=false"
17.4.2. Heap Dump on OutOfMemoryError 复制链接链接已复制到粘贴板!
Data Grid enables automatic heap dumps on OutOfMemoryError to the $ISPN_HOME/server/log directory. Heap dumps are useful for troubleshooting memory issues. You can use tools such as jhat (included in the JDK), VisualVM or Eclipse Memory Analyzer to inspect heap dumps. Disable heap dumps by setting the HEAP_DUMP environment variable to false before launching the server.
Linux
export HEAP_DUMP="false"
Windows
set "HEAP_DUMP=false"
Heap dumps are snapshots of the in-memory data you have stored in Data Grid caches and may therefore contain sensitive data. Consider sanitizing the heap dumps before sharing them with third parties by using a tool such as the Heap Dump Tool.
17.5. Virtual Threads Support 复制链接链接已复制到粘贴板!
Infinispan supports virtual threads, which can significantly improve application responsiveness and scalability under high concurrency. By default, they are enabled if you are running on JDK 21 or higher.
On systems with JDK versions prior to 24 and low CPU counts (2 or less), Data Grid might experience thread pinning issues when using virtual threads. Thread pinning is a situation where virtual threads are unexpectedly bound to a limited number of OS threads, potentially leading to performance degradation or system freezes.
To work around this problem, virtual threads can be disabled as described in the procedure below, or the virtual thread scheduler parallelism may be increased with the Java option -Djdk.virtualThreadScheduler.parallelism=<value>. A common starting point for <value> on low-CPU systems is 4.
17.5.1. Disable Virtual Threads 复制链接链接已复制到粘贴板!
Disables the virtual threads pool.
Set the
-Dorg.infinispan.threads.virtual=falseoption in theJAVA_OPTSenvironment variable before starting the Data Grid server.Alternatively, you can append this option directly to the command used to start the Data Grid server:
bin/server.sh -Dorg.infinispan.threads.virtual=falseVerify by checking the server logs. The following log entry should not be present:
Virtual threads support enabled
17.6. Accessing cluster health via REST 复制链接链接已复制到粘贴板!
Get Data Grid cluster health via the REST API.
Procedure
Invoke a
GETrequest to retrieve cluster health.GET /rest/v2/container/health
Data Grid responds with a JSON document such as the following:
{
"cluster_health":{
"cluster_name":"ISPN",
"health_status":"HEALTHY",
"number_of_nodes":2,
"node_names":[
"NodeA-36229",
"NodeB-28703"
]
},
"cache_health":[
{
"status":"HEALTHY",
"cache_name":"___protobuf_metadata"
},
{
"status":"HEALTHY",
"cache_name":"cache2"
},
{
"status":"HEALTHY",
"cache_name":"mycache"
},
{
"status":"HEALTHY",
"cache_name":"cache1"
}
]
}
Get Cache Manager status as follows:
GET /rest/v2/container/health/status
Reference
See the REST v2 (version 2) API documentation for more information.
17.7. Accessing cluster health via JMX 复制链接链接已复制到粘贴板!
Retrieve Data Grid cluster health statistics via JMX.
Procedure
Connect to Data Grid server using any JMX capable tool such as JConsole and navigate to the following object:
org.infinispan:type=CacheManager,name="default",component=CacheContainerHealth- Select available MBeans to retrieve cluster health statistics.
17.8. Benchmark the Data Grid Server 复制链接链接已复制到粘贴板!
Data Grid provides a built-in benchmark utility accessible through the CLI to validate server configuration changes. This benchmark will run operations for writes and reads, submitting requests as fast as possible. Since this might be a contrived scenario that does not represent a real production environment, the numbers should be read carefully. Nevertheless, this approach is simple enough to verify for configuration changes.
Check the CLI command documentation for more information.