Chapter 7. Deprecated metrics

The following metrics are deprecated and will be removed in a future version of AI Inference Server:

vllm:num_requests_swapped
vllm:cpu_cache_usage_perc
vllm:cpu_prefix_cache_hit_rate (KV cache offloading is not used in V1).
vllm:gpu_prefix_cache_hit_rate. This metric is replaced by queries+hits counters in V1.
vllm:time_in_queue_requests. This metric is duplicated by vllm:request_queue_time_seconds.
vllm:model_forward_time_milliseconds
vllm:model_execute_time_milliseconds. Prefill, decode or inference time metrics should be used instead.

Important

When metrics are deprecated in version X.Y, they are hidden in version X.Y+1 but can be re-enabled by using the --show-hidden-metrics-for-version=X.Y escape hatch. Deprecated metrics are completely removed in the following version X.Y+2.

Github

Youtube

Twitter

Learn

Try, buy, & sell

Communities

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Legal Notice