Chapter 1. About the AI Inference API server

Red Hat AI Inference provides an OpenAI-compatible API server for inference serving. You can control the behavior of the server with arguments.

The AI Inference command-line interface includes commands for serving models, running chat completions, generating text completions, benchmarking performance, and collecting environment information for troubleshooting. Each command accepts specific arguments that configure resource allocation, model loading behavior, API compatibility options, and performance tuning parameters.

You can also configure AI Inference behavior through environment variables, which is useful for containerized deployments where command-line arguments are less practical. Built-in metrics endpoints provide observability into server performance, request latency, token throughput, and resource utilization.

このコンテンツは選択した言語では利用できません。

Chapter 1. About the AI Inference API server

詳細情報

試用、購入および販売

コミュニティー

会社概要

多様性を受け入れるオープンソースの強化

Red Hat ドキュメントについて

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links