Este contenido no está disponible en el idioma seleccionado.
Chapter 7. Deprecated metrics
The following metrics are deprecated and will be removed in a future version of AI Inference Server:
-
vllm:num_requests_swapped -
vllm:cpu_cache_usage_perc -
vllm:cpu_prefix_cache_hit_rate(KV cache offloading is not used in V1). -
vllm:gpu_prefix_cache_hit_rate. This metric is replaced by queries+hits counters in V1. -
vllm:time_in_queue_requests. This metric is duplicated byvllm:request_queue_time_seconds. -
vllm:model_forward_time_milliseconds -
vllm:model_execute_time_milliseconds. Prefill, decode or inference time metrics should be used instead.
When metrics are deprecated in version X.Y, they are hidden in version X.Y+1 but can be re-enabled by using the --show-hidden-metrics-for-version=X.Y escape hatch. Deprecated metrics are completely removed in the following version X.Y+2.