此内容没有您所选择的语言版本。

Chapter 1. About the AI Inference API server


Red Hat AI Inference provides an OpenAI-compatible API server for inference serving. You can control the behavior of the server with arguments.

The AI Inference command-line interface includes commands for serving models, running chat completions, generating text completions, benchmarking performance, and collecting environment information for troubleshooting. Each command accepts specific arguments that configure resource allocation, model loading behavior, API compatibility options, and performance tuning parameters.

You can also configure AI Inference behavior through environment variables, which is useful for containerized deployments where command-line arguments are less practical. Built-in metrics endpoints provide observability into server performance, request latency, token throughput, and resource utilization.

Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

关于红帽文档

Legal Notice

Theme

© 2026 Red Hat
返回顶部