Chapter 4. Viewing AI Inference Server metrics
vLLM exposes various metrics via the /metrics
endpoint on the AI Inference Server OpenAI-compatible API server.
You can start the server by using Python, or using Docker.
Procedure
Launch the AI Inference Server server and load your model as shown in the following example. The command also exposes the OpenAI-compatible API.
vllm serve unsloth/Llama-3.2-1B-Instruct
$ vllm serve unsloth/Llama-3.2-1B-Instruct
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Query the
/metrics
endpoint of the OpenAI-compatible API to get the latest metrics from the server:curl http://0.0.0.0:8000/metrics
$ curl http://0.0.0.0:8000/metrics
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow