Chapter 1. About model serving

When you serve a model, you upload a trained model into Red Hat OpenShift AI for querying, which allows you to integrate your trained models into intelligent applications.

You can upload a model to an S3-compatible object storage, persistent volume claim, or Open Container Initiative (OCI) image. You can then access and train the model from your project workbench. After training the model, you can serve or deploy the model using a model-serving platform.

Serving or deploying the model makes the model available as a service, or model runtime server, that you can access using an API. You can then access the inference endpoints for the deployed model from the dashboard and see predictions based on data inputs that you provide through API calls. Querying the model through the API is also called model inferencing.

You can serve models on one of the following model-serving platforms:

Single-model serving platform
Multi-model serving platform
NVIDIA NIM model serving platform

The model-serving platform that you choose depends on your business needs:

If you want to deploy each model on its own runtime server, or want to use a serverless deployment, select the single-model serving platform. The single-model serving platform is recommended for production use.
If you want to deploy multiple models with only one runtime server, select the multi-model serving platform. This option is best if you are deploying more than 1,000 small and medium models and want to reduce resource consumption.
If you want to use NVIDIA Inference Microservices (NIMs) to deploy a model, select the NVIDIA NIM-model serving platform.

1.1. Single-model serving platform
Copy link

You can deploy each model from a dedicated model server on the single-model serving platform. Deploying models from a dedicated model server can help you deploy, monitor, scale, and maintain models that require increased resources. This model serving platform is ideal for serving large models. The single-model serving platform is based on the KServe component.

The single-model serving platform is helpful for use cases such as:

Large language models (LLMs)
Generative AI

For more information about setting up the single-model serving platform, see Installing the single-model serving platform.

1.2. Multi-model serving platform
Copy link

You can deploy multiple models from the same model server on the multi-model serving platform. Each of the deployed models shares the server resources. Deploying multiple models from the same model server can be advantageous on OpenShift clusters that have finite compute resources or pods. This model serving platform is ideal for serving small and medium models in large quantities. The multi-model serving platform is based on the ModelMesh component.

For more information about setting up the multi-model serving platform, see Installing the multi-model serving platform.

1.3. NVIDIA NIM model serving platform
Copy link

You can deploy models using NVIDIA Inference Microservices (NIM) on the NVIDIA NIM model serving platform.

NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of microservices designed for secure, reliable deployment of high performance AI model inferencing across clouds, data centers and workstations.

NVIDIA NIM inference services are helpful for use cases such as:

Using GPU-accelerated containers inferencing models optimized by NVIDIA
Deploying generative AI for virtual screening, content generation, and avatar creation

The NVIDIA NIM model serving platform is based on the single-model serving platform. To use the NVIDIA NIM model serving platform, you must first install the single-model serving platform. For more information, see Installing the single-model serving platform.

Chapter 1. About model serving

1.1. Single-model serving platform
Copy link

1.2. Multi-model serving platform
Copy link

1.3. NVIDIA NIM model serving platform
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 1. About model serving

1.1. Single-model serving platformCopy linkLink copied to clipboard!

1.2. Multi-model serving platformCopy linkLink copied to clipboard!

1.3. NVIDIA NIM model serving platformCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.1. Single-model serving platform
Copy link

1.2. Multi-model serving platform
Copy link

1.3. NVIDIA NIM model serving platform
Copy link