Working with the model catalog
Working with the model catalog in Red Hat OpenShift AI Self-Managed
Abstract
Preface Copy linkLink copied to clipboard!
As a data scientist or AI engineer in OpenShift AI, you can discover and evaluate the generative AI models that are available in the model catalog. From the model catalog, you can select the models that you want to register, deploy, and customize.
Chapter 1. Overview of the model catalog and model registries Copy linkLink copied to clipboard!
The model catalog provides a curated library where data scientists and AI engineers can discover and evaluate the available generative AI (gen AI) models to find the best fit for their use cases.
A model registry acts as a central repository for administrators, data scientists, and AI engineers to register, version, and manage the lifecycle of AI models before configuring them for deployment. A model registry is a key component for AI model governance.
1.1. Model catalog Copy linkLink copied to clipboard!
Data scientists and AI engineers can use the model catalog to discover and evaluate the gen AI models that are available and ready for their organization to register, deploy, and customize.
The model catalog provides models from different providers that you can search, discover, and evaluate before you register models in a model registry and deploy them to a model serving runtime. Third-party gen AI models are benchmarked by Red Hat for performance and quality by using open-source evaluation datasets. You can compare performance metrics for specific hardware configurations and determine the most suitable option for deployment.
OpenShift AI provides a default model catalog, which includes models from providers such as Red Hat, IBM, Meta, Nvidia, Mistral AI, and Google. OpenShift AI administrators can configure the available repository sources for models displayed in the model catalog.
For more information about how data scientists and AI engineers can use the model catalog, see Working with the model catalog.
1.2. Model registry Copy linkLink copied to clipboard!
A model registry is an important component in the lifecycle of an artificial intelligence/machine learning (AI/ML) model, and is a vital part of any machine learning operations (MLOps) platform or workflow. A model registry acts as a central repository, storing metadata related to machine learning models from development to deployment. This metadata ranges from high-level information like the deployment environment and project, to specific details like training hyperparameters, performance metrics, and deployment events.
A model registry acts as a bridge between model experimentation and serving, offering a secure, collaborative metadata store interface for stakeholders in the ML lifecycle. Model registries provide a structured and organized way to store, share, version, deploy, and track models.
OpenShift AI administrators can create model registries in OpenShift AI and grant model registry access to data scientists and AI engineers. For more information, see Managing model registries.
Data scientists and AI engineers with access to a model registry can use it to store, share, version, deploy, and track models. For more information, see Working with model registries.
Chapter 2. Discovering and evaluating models in the model catalog Copy linkLink copied to clipboard!
You can discover and evaluate the available gen AI models in the model catalog to find the best fit for your use cases. You can select from available model categories, search by text, and filter by labels.
For validated models, you can view performance benchmark data for specific hardware configurations to evaluate and compare options for deployment.
Prerequisites
- You are logged in to Red Hat OpenShift AI.
- The model registry component is enabled in your OpenShift AI deployment. For more information, see Enabling the model registry component.
Procedure
- From the OpenShift AI dashboard, click AI hub → Catalog.
- The Catalog page provides a high-level view of available models, including the model category, name, description, and labels such as task, license, and provider. You can also view performance benchmarks for validated models from third parties.
In the menu bar, select from the available model categories:
- All models: All models available in the model catalog.
- Red Hat AI models: Models provided and supported by Red Hat.
- Red Hat AI validated models: Third-party models benchmarked by Red Hat for performance and quality by using open-source evaluation datasets.
Other models: Custom third-party and community models configured by your administrator that do not have any catalog source labels. This category is only displayed if there are catalog sources without labels.
Otherwise, custom models with labels configured by your administrator are displayed in a category with the same name as the label set for the custom catalog source.
- You can use the search bar to find a model in the catalog. Enter text to search by model name, description, or provider.
You can use the filter menu to search and select filters by the following labels:
-
Task: For example,
Text-generation. -
Provider: For example,
Meta. -
License: For example,
Apache 2.0. -
Language: For example,
Japanese.
-
Task: For example,
- Click the name of a model to view the model details page. This page displays the model description and the Model card information supplied by the model provider. This includes details such as the model’s intended use and potential limitations, training parameters and datasets, and evaluation results.
For validated models, click the Performance Insights tab to view performance benchmark data to compare performance metrics for specific hardware configurations and to determine the most suitable options for deployment.
You can filter the performance data by the following options:
-
Workload type: Select a workload type to view performance under different token lengths, for example,
Chatbot. Max latency: Set your maximum acceptable latency. Hardware configurations that respond slower than this value are hidden. You can select a specific metric in the list:
-
E2E(end-to-end request latency): The time taken from submitting the request to receiving the final response. -
TTFT(time to first token): The time that the user must wait before seeing output from the model. -
TPS(tokens per second): The total number of tokens that are output per second. ITL(inter-token latency): The average time taken between consecutive tokens.You can also select a percentile value, for example,
P90.Use the slider to set the maximum acceptable latency value in milliseconds, and click Apply filter.
-
Min RPS: Set to only show models that can handle at least this number of requests. Hardware configurations that perform below this value are hidden.
Use the slider to set the minimum requests per second value, and click Apply filter.
Hardware type: Select one or more hardware types from the list, for example,
H200.You can click Clear all filters to reset your filters and try again.
-
Workload type: Select a workload type to view performance under different token lengths, for example,
- For categories with more than 10 models, you can click Load more models to scroll and view additional models available in the catalog. Repeat this step until all models are loaded.
Verification
- For all models, you can view the information about a selected model on the model details page.
- For validated models, you can view the benchmark information about a selected model on Performance Insights tab.
Chapter 3. Registering a model from the model catalog Copy linkLink copied to clipboard!
As a data scientist or AI engineer, you can register models directly from the model catalog and create the first version of the new model.
Prerequisites
- You are logged in to Red Hat OpenShift AI.
- You have access to an available model registry in your deployment.
Procedure
- From the OpenShift AI dashboard, click AI hub → Catalog.
- The Catalog page provides a high-level view of available models, including the model category, name, description, and labels such as task, license, and provider.
- You can use the search bar to search by model name, description, or provider.
- You can use the filter menu to search and select filters by task, provider, or license.
- Click the name of a model to view the model details page.
- Click Register model.
- From the Model registry drop-down list, select the model registry that you want to register the model in.
In the Model details section, configure details to apply to all versions of the model:
- Optional: In the Model name field, update the name of the model.
- Optional: In the Model description field, update the description of the model.
In the Version details section, enter details to apply to the first version of the model:
- In the Version name field, enter a name for the model version.
- Optional: In the Version description field, enter a description for the first version of the model.
-
In the Source model format field, enter the name of the model format, for example,
ONNX. - In the Source model format version field, enter the version of the model format.
- In the Model location section, the URI of the model is displayed.
- Click Register model.
Verification
- The new model details and version are displayed on the Overview tab on the model details page.
- The new model and version are displayed on the Model registry page.
Chapter 4. Deploying a model from the model catalog Copy linkLink copied to clipboard!
You can deploy models directly from the model catalog.
OpenShift AI model serving deployments use the global cluster pull secret to pull models in OCI-compliant ModelCar format from the catalog.
For more information about using pull secrets in OpenShift, see Updating the global cluster pull secret in the OpenShift documentation.
Prerequisites
- You have completed the prerequisites in Deploying models.
- The model registry component is enabled in your OpenShift AI deployment. For more information, see Enabling the model registry component.
Procedure
- From the OpenShift AI dashboard, click AI hub → Catalog.
- The Catalog page provides a high-level view of available models, including the model category, name, description, and labels such as task, license, and provider.
- You can use the search bar to search by model name, description, or provider.
- You can use the filter menu to search and select filters by task, provider, or license.
- Click the name of a model to view the model details page.
- Click Deploy model to display the Deploy a model wizard.
On the Model details page, in the Model type field, you can select Generative AI model or Predictive model. The default model type in the catalog is generative.
NoteFor models in the catalog, the Model details page displays read-only information in the Model location and URI fields.
In the Model deployment page, configure the deployment as follows:
- From the Project list, select the project in which to deploy your model.
In the Model deployment name field, enter a unique name for your model deployment. This field is autofilled with the model name by default.
This is the name of the inference service created when the model is deployed.
Optional: Click Edit resource name, and enter a specific name in the Resource name field. By default, the resource name matches the name of the model deployment.
ImportantResource names are what your resources are labeled as in OpenShift. Your resource name cannot exceed 253 characters, must consist of lowercase alphanumeric characters or -, and must start and end with an alphanumeric character. You cannot edit resource names after creation.
The resource name must not match the name of any other model deployment resource in your OpenShift cluster.
- In the Description field, enter a description of your deployment.
-
From the Hardware profile list, select a hardware profile. Models provided in the catalog use the
default-profile. - Optional: To modify the default resource allocation, click Customize resource requests and limits and enter new values for the CPU and memory requests and limits.
In the Serving runtime field, select one of the following options:
Auto-select the best runtime for your model based on model type, model format, and hardware profile
The system analyzes the selected model framework and your available hardware profiles to recommend a serving runtime.
Select from a list of serving runtimes, including custom ones
Select this option to manually choose a runtime from the list of global and project-scoped serving runtime templates.
For more information about how the system determines the best runtime and administrator overrides, see Automatic selection of serving runtimes.
- Optional: For predictive AI models only, you can select a framework from the Model framework (name - version) list. This field is not displayed for generative AI models.
- In the Number of model server replicas to deploy field, specify a value.
- Click Next.
On the Advanced settings page, configure the following options:
Select the Add as AI asset endpoint checkbox if you want to add your gen AI model endpoint to the Gen AI studio → AI asset endpoints page.
In the Use case field, enter the types of tasks that your model performs, such as chat, multimodal, or natural language processing.
NoteYou must add your model as an AI asset endpoint to test your model on the Gen AI studio → playground page.
- To require token authentication for inference requests to the deployed model, select Require token authentication.
In the Service account name field, enter the service account name that the token will be generated for.
- To add an additional service account, click Add a service account and enter another service account name.
In the Configuration parameters section:
- Select Add custom runtime arguments, and then enter arguments in the text field.
- Select Add custom runtime environment variables, and then click Add variable to enter custom variables in the text field.
In the Deployment strategy section, select one of the following options:
- Rolling update: Existing inference service pods are terminated after new pods are started. This ensures zero downtime and continuous availability. This is the default option.
- Recreate: Existing inference service pods are terminated before new pods are started. This saves resources but guarantees a period of downtime.
- On the Review page, review the settings that you have selected before deploying the model.
- Click Deploy model.
Verification
The model deployment is displayed in the following places in the dashboard:
- The AI hub → Deployments page.
- The Latest deployments section of the model details page.
- The Deployments tab for the model version.
Chapter 5. Configuring model catalog sources in OpenShift Copy linkLink copied to clipboard!
As an OpenShift cluster administrator, you can configure a custom model catalog source by adding an entry to the model-catalog-sources config map in OpenShift. This entry provides details such as the catalog name and the location of the catalog source file that specifies your model definitions.
Prerequisites
- You are logged in to the OpenShift web console with cluster administrator privileges.
- You have created a catalog source file that specifies your model definitions. For an example catalog source file, see the sample-catalog.yaml file from Kubeflow Model Registry.
Procedure
- In the OpenShift web console navigation menu, click Workloads → ConfigMaps.
-
In the Project list, enter
rhoai-model-registries, and click the project name. - Click the model-catalog-sources config map name.
-
Click the YAML tab to view the contents of the config map
.yamlfile. -
Find the
data: sources.yamlsection to add your custom catalog source. Add a new entry under the
catalogsfield, for example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow This example adds the
sample-catalog.yamlfile in the same config map as a sibling ofsources.yamland references the sample catalog file in theyamlCatalogPathproperty.The sample catalog source entry is described as follows:
-
name: The user-friendly name of the catalog source. -
id: The unique ID of the catalog source. -
type: The catalog source type. Useyaml. -
enabled: Whether the catalog source is enabled. Defaults totrue. -
properties.yamlCatalogPath: The location of your catalog source file. -
labels: Optional labels for your catalog source to display in the catalog.
-
- Click Save.
Verification
- In the OpenShift AI dashboard, click AI hub → Catalog.
In the menu bar, to view models from your new catalog source:
- If your catalog source has no label, click Other models.
If your catalog source has a label, click <your-label-name> models.
NoteThese changes might take a few minutes to be displayed in the catalog on the dashboard.
Additional resources