Chapter 4. Deploying a model from the model catalog


You can deploy models directly from the model catalog.

Note

OpenShift AI model serving deployments use the global cluster pull secret to pull models in OCI-compliant ModelCar format from the catalog.

For more information about using pull secrets in OpenShift, see Updating the global cluster pull secret in the OpenShift documentation.

Prerequisites

Procedure

  1. From the OpenShift AI dashboard, click AI hub Catalog.
  2. The Catalog page provides a high-level view of available models, including the model category, name, description, and labels such as task, license, and provider.
  3. You can use the search bar to search by model name, description, or provider.
  4. You can use the filter menu to search and select filters by task, provider, or license.
  5. Click the name of a model to view the model details page.
  6. Click Deploy model to display the Deploy a model wizard.
  7. On the Model details page, in the Model type field, you can select Generative AI model or Predictive model. The default model type in the catalog is generative.

    Note

    For models in the catalog, the Model details page displays read-only information in the Model location and URI fields.

  8. In the Model deployment page, configure the deployment as follows:

    1. From the Project list, select the project in which to deploy your model.
    2. In the Model deployment name field, enter a unique name for your model deployment. This field is autofilled with the model name by default.

      This is the name of the inference service created when the model is deployed.

    3. Optional: Click Edit resource name, and enter a specific name in the Resource name field. By default, the resource name matches the name of the model deployment.

      Important

      Resource names are what your resources are labeled as in OpenShift. Your resource name cannot exceed 253 characters, must consist of lowercase alphanumeric characters or -, and must start and end with an alphanumeric character. You cannot edit resource names after creation.

      The resource name must not match the name of any other model deployment resource in your OpenShift cluster.

    4. In the Description field, enter a description of your deployment.
    5. From the Hardware profile list, select a hardware profile. Models provided in the catalog use the default-profile.
    6. Optional: To modify the default resource allocation, click Customize resource requests and limits and enter new values for the CPU and memory requests and limits.
    7. In the Serving runtime field, select one of the following options:

      • Auto-select the best runtime for your model based on model type, model format, and hardware profile

        The system analyzes the selected model framework and your available hardware profiles to recommend a serving runtime.

      • Select from a list of serving runtimes, including custom ones

        Select this option to manually choose a runtime from the list of global and project-scoped serving runtime templates.

        For more information about how the system determines the best runtime and administrator overrides, see Automatic selection of serving runtimes.

    8. Optional: For predictive AI models only, you can select a framework from the Model framework (name - version) list. This field is not displayed for generative AI models.
    9. In the Number of model server replicas to deploy field, specify a value.
    10. Click Next.
  9. On the Advanced settings page, configure the following options:

    • Select the Add as AI asset endpoint checkbox if you want to add your gen AI model endpoint to the Gen AI studio AI asset endpoints page.

      • In the Use case field, enter the types of tasks that your model performs, such as chat, multimodal, or natural language processing.

        Note

        You must add your model as an AI asset endpoint to test your model on the Gen AI studio playground page.

    • To require token authentication for inference requests to the deployed model, select Require token authentication.
    • In the Service account name field, enter the service account name that the token will be generated for.

      • To add an additional service account, click Add a service account and enter another service account name.
    • In the Configuration parameters section:

      • Select Add custom runtime arguments, and then enter arguments in the text field.
      • Select Add custom runtime environment variables, and then click Add variable to enter custom variables in the text field.
    • In the Deployment strategy section, select one of the following options:

      • Rolling update: Existing inference service pods are terminated after new pods are started. This ensures zero downtime and continuous availability. This is the default option.
      • Recreate: Existing inference service pods are terminated before new pods are started. This saves resources but guarantees a period of downtime.
  10. On the Review page, review the settings that you have selected before deploying the model.
  11. Click Deploy model.

Verification

  • The model deployment is displayed in the following places in the dashboard:

    • The AI hub Deployments page.
    • The Latest deployments section of the model details page.
    • The Deployments tab for the model version.
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top