Preface
You can inference large language models with Red Hat AI Inference Server without any connection to the outside internet by installing OpenShift Container Platform and configuring a mirrored container image registry in the disconnected environment.
Currently, only NVIDIA CUDA AI accelerators are supported for OpenShift Container Platform in disconnected environments.