Questo contenuto non è disponibile nella lingua selezionata.
Chapter 6. Deprecated metrics
The following metrics are deprecated and will be removed in a future version of AI Inference Server:
-
vllm:num_requests_swapped
-
vllm:cpu_cache_usage_perc
-
vllm:cpu_prefix_cache_hit_rate
(KV cache offloading is not used in V1). -
vllm:gpu_prefix_cache_hit_rate
. This metric is replaced by queries+hits counters in V1. -
vllm:time_in_queue_requests
. This metric is duplicated byvllm:request_queue_time_seconds
. -
vllm:model_forward_time_milliseconds
-
vllm:model_execute_time_milliseconds
. Prefill, decode or inference time metrics should be used instead.
When metrics are deprecated in version X.Y
, they are hidden in version X.Y+1
but can be re-enabled by using the --show-hidden-metrics-for-version=X.Y
escape hatch. Deprecated metrics are completely removed in the following version X.Y+2
.