Chapter 4. Deploying a model from the model catalog
You can deploy models directly from the model catalog.
OpenShift AI model serving deployments use the global cluster pull secret to pull models in OCI-compliant ModelCar format from the catalog.
For more information about using pull secrets in OpenShift, see Updating the global cluster pull secret in the OpenShift documentation.
Prerequisites
- You have completed the prerequisites in Deploying models.
- The model registry component is enabled in your OpenShift AI deployment. For more information, see Enabling the model registry component.
Procedure
-
From the OpenShift AI dashboard, click AI hub
Catalog. - The Catalog page provides a high-level view of available models, including the model category, name, description, and labels such as task, license, and provider.
- You can use the search bar to search by model name, description, or provider.
- You can use the filter menu to search and select filters by task, provider, or license.
- Click the name of a model to view the model details page.
- Click Deploy model to display the Deploy a model wizard.
On the Model details page, in the Model type field, you can select Generative AI model or Predictive model. The default model type in the catalog is generative.
NoteFor models in the catalog, the Model details page displays read-only information in the Model location and URI fields.
In the Model deployment page, configure the deployment as follows:
- From the Project list, select the project in which to deploy your model.
In the Model deployment name field, enter a unique name for your model deployment. This field is autofilled with the model name by default.
This is the name of the inference service created when the model is deployed.
Optional: Click Edit resource name, and enter a specific name in the Resource name field. By default, the resource name matches the name of the model deployment.
ImportantResource names are what your resources are labeled as in OpenShift. Your resource name cannot exceed 253 characters, must consist of lowercase alphanumeric characters or -, and must start and end with an alphanumeric character. You cannot edit resource names after creation.
The resource name must not match the name of any other model deployment resource in your OpenShift cluster.
- In the Description field, enter a description of your deployment.
-
From the Hardware profile list, select a hardware profile. Models provided in the catalog use the
default-profile. - Optional: To modify the default resource allocation, click Customize resource requests and limits and enter new values for the CPU and memory requests and limits.
In the Serving runtime field, select one of the following options:
Auto-select the best runtime for your model based on model type, model format, and hardware profile
The system analyzes the selected model framework and your available hardware profiles to recommend a serving runtime.
Select from a list of serving runtimes, including custom ones
Select this option to manually choose a runtime from the list of global and project-scoped serving runtime templates.
For more information about how the system determines the best runtime and administrator overrides, see Automatic selection of serving runtimes.
- Optional: For predictive AI models only, you can select a framework from the Model framework (name - version) list. This field is not displayed for generative AI models.
- In the Number of model server replicas to deploy field, specify a value.
- Click Next.
On the Advanced settings page, configure the following options:
Select the Add as AI asset endpoint checkbox if you want to add your gen AI model endpoint to the Gen AI studio
AI asset endpoints page. In the Use case field, enter the types of tasks that your model performs, such as chat, multimodal, or natural language processing.
NoteYou must add your model as an AI asset endpoint to test your model on the Gen AI studio
playground page.
- To require token authentication for inference requests to the deployed model, select Require token authentication.
In the Service account name field, enter the service account name that the token will be generated for.
- To add an additional service account, click Add a service account and enter another service account name.
In the Configuration parameters section:
- Select Add custom runtime arguments, and then enter arguments in the text field.
- Select Add custom runtime environment variables, and then click Add variable to enter custom variables in the text field.
In the Deployment strategy section, select one of the following options:
- Rolling update: Existing inference service pods are terminated after new pods are started. This ensures zero downtime and continuous availability. This is the default option.
- Recreate: Existing inference service pods are terminated before new pods are started. This saves resources but guarantees a period of downtime.
- On the Review page, review the settings that you have selected before deploying the model.
- Click Deploy model.
Verification
The model deployment is displayed in the following places in the dashboard:
-
The AI hub
Deployments page. - The Latest deployments section of the model details page.
- The Deployments tab for the model version.
-
The AI hub