このコンテンツは選択した言語では利用できません。
Chapter 1. Deploy and manage Models-as-a-Service
You can deploy Models-as-a-Service (MaaS) to provide subscription-based governance for large language model serving. With MaaS, you can define subscriptions that grant groups access to models with token limits, control access through API key authentication, and track resource consumption for cost allocation.
1.1. Models-as-a-Service overview リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, Models-as-a-Service (MaaS) provides subscription-based governance for large language model (LLM) serving across your organization. This platform helps you manage resource consumption and governance challenges when you serve models to a large user base.
In OpenShift AI 3.4, MaaS uses a subscription-based model for quota management. This replaces the tier-based model used in OpenShift AI 3.3. Subscriptions provide enhanced flexibility and are managed through custom resources instead of ConfigMaps.
As an administrator, you can use this subscription-based system to expose models through managed API endpoints. With this structure, you can enforce different consumption policies for different user groups and deliver AI models as shared resources with appropriate access levels.
The Models-as-a-Service platform acts as a governance layer between users and model serving infrastructure. You can enforce centralized policies without modifying the underlying model serving components.
Models-as-a-Service provides the following capabilities:
- Subscription-based quota management
- Define multiple subscriptions that grant specific groups quota for models with configurable token limits. Users can belong to multiple subscriptions, with priority levels determining which subscription is used when multiple options are available.
- Self-service API key management
- Users can create, list, and revoke their own API keys for model access. Administrators can also provision and manage API keys on behalf of users. API keys can be permanent or configured with custom expiration times, and individual keys can be revoked without affecting other keys for the same user.
- Multi-runtime support
- Expose models served with llm-d or vLLM runtimes through MaaS governance. You can apply consistent governance across different serving infrastructures. vLLM runtime support is a Technology Preview feature in Red Hat OpenShift AI.
- Policy and quota management
- Enforce token limit policies to prevent resource exhaustion.
- Usage tracking and observability
- Monitor subscription-level token consumption, request counts, and rate-limit violations through the MaaS observability dashboard. Track consumption metrics for cost allocation and billing. Export usage data in CSV format for cost attribution and showback reporting to finance teams. The MaaS observability dashboard is a Technology Preview feature in Red Hat OpenShift AI.
- External models
- Route inference requests to models hosted by external cloud providers such as AWS Bedrock, Azure OpenAI, or Google Vertex AI through the same MaaS gateway used for locally deployed models. External models is a Technology Preview feature in Red Hat OpenShift AI.
- External OpenID Connect (OIDC) authentication
- Integrate with external OIDC identity providers for user authentication to provide enterprise-wide access without requiring OpenShift user accounts. External OIDC authentication is a Technology Preview feature in Red Hat OpenShift AI.
The following table summarizes when MaaS is the right choice and when standard model serving is sufficient.
| MaaS | Standard model serving |
|---|---|
| Centralized governance across multiple teams or projects is required. | You are deploying models for single-team or single-user use cases. |
| You need token limit enforcement and usage tracking for cost control. | You are prototyping or developing models in a single-user environment where governance overhead is unnecessary. |
| You prefer declarative configuration management via GitOps. | Simplified deployment is preferred over centralized control. |
MaaS administration is divided into initial configuration and ongoing management, with distinct responsibilities for cluster administrators and OpenShift AI administrators.
| Phase | Cluster administrators | OpenShift AI administrators |
|---|---|---|
| Initial configuration |
|
|
| Ongoing operations |
|
|
1.1.1. Models-as-a-Service custom resources リンクのコピーリンクがクリップボードにコピーされました!
Models-as-a-Service (MaaS) uses Kubernetes custom resources for declarative configuration management. You can integrate MaaS with GitOps workflows and version control. The platform uses the following custom resource types:
- Tenant
- Configures tenant-specific settings including API key expiration limits, external OIDC authentication, telemetry options, and gateway references.
- MaaSModelRef
- References inference servers served through OpenShift AI. Models can be served using llm-d distributed inference, vLLM runtimes, or external LLM providers.
- ExternalModel
- Defines external LLM provider configurations for models hosted outside the cluster, such as OpenAI or Anthropic. You can apply MaaS governance to third-party LLM services.
- MaaSSubscription
- Defines subscription-based quota by specifying which groups have quota for which models with configurable token rate limits. Subscriptions include priority levels for users belonging to multiple groups and optional metadata for cost allocation.
- MaaSAuthPolicy
- Authorizes groups to access model endpoints through the API gateway. Subscriptions control token limits, while authorization policies grant API gateway access.
With these custom resources, administrators can manage MaaS configurations using standard Kubernetes tools and GitOps workflows. Changes to custom resources are automatically reconciled by the platform controllers.
You can view and manage these custom resources using the OpenShift console or OpenShift CLI (oc).
Using the console:
Navigate to Administration
Using the CLI:
List resource instances:
$ oc get maassubscriptions -n models-as-a-service
$ oc get maasmodelrefs -n <namespace>
$ oc get tenants.maas.opendatahub.io -n models-as-a-service
View the YAML configuration of a specific resource:
$ oc get maassubscription <subscription-name> -n models-as-a-service -o yaml
$ oc get tenants.maas.opendatahub.io default-tenant -n models-as-a-service -o yaml
1.2. Prerequisites for Models-as-a-Service リンクのコピーリンクがクリップボードにコピーされました!
Before deploying Models-as-a-Service (MaaS) in Red Hat OpenShift AI, verify that your cluster has the required platform components, operators, and infrastructure resources, and that the necessary configuration flags are enabled.
Platform and access requirements:
- You have a cluster with OpenShift version 4.19.9 or later.
- You have cluster administrator access to install Operators and create cluster-scoped resources.
- Your cluster has a functional ingress controller with valid TLS certificates for external access.
-
You have installed OpenShift CLI (
oc).
Operator requirements:
- You have installed Red Hat OpenShift AI 3.4 or later.
- If you plan to use distributed inference with llm-d, you have completed the setup steps in Enabling distributed inference and configured authentication as described in Enabling authentication and authorization for LLM Inference Service.
-
You have installed the Red Hat Connectivity Link Operator version 1.2 or later to the
openshift-operatorsnamespace and created aKuadrantcustom resource in thekuadrant-systemnamespace with ready status. For installation and configuration instructions, see Configuring authentication for llm-d using Red Hat Connectivity Link. -
If you plan to use vLLM runtime with Models-as-a-Service, you have set
spec.dashboardConfig.vLLMDeploymentOnMaaStotruein theOdhDashboardConfigcustom resource. vLLM runtime for Models-as-a-Service is available as a Technology Preview feature. For more information, see Technology Preview Features Support Scope.
Infrastructure requirements:
-
You have created a
DataScienceClusterresource with thekservecomponent set toManaged. -
You have enabled User Workload Monitoring on your OpenShift cluster. User Workload Monitoring is required for MaaS to collect and expose usage metrics for token consumption, request counts, and rate limiting. Without User Workload Monitoring enabled, the MaaS installation shows a
Degradedstatus. For information about enabling User Workload Monitoring, see Enabling monitoring for user-defined projects. - You have deployed a PostgreSQL database instance, version 14 or later, that is reachable from the OpenShift cluster network. This database is required for API key lifecycle management. OpenShift AI does not provide a PostgreSQL database. You must provision and manage your own PostgreSQL instance.
-
You have created a
Secretnamedmaas-db-configin theredhat-ods-applicationsnamespace containing the PostgreSQL database connection details. For configuration instructions, see Configure the database secret for Models-as-a-Service. You have created a
GatewayClassresource configured for the OpenShift Gateway Controller (openshift.io/gateway-controller) and aGatewaynamedmaas-default-gatewayin theopenshift-ingressnamespace. TheGatewayresource must include the following annotations:-
opendatahub.io/managed: "false"- Prevents the ODH Model Controller from overriding MaaS-managed authorization policies. -
security.opendatahub.io/authorino-tls-bootstrap: "true"- Enables TLS communication between theGatewayandAuthorino.
For information about creating Gateway API resources, see Enabling the Gateway API. For example gateway configuration templates, see MaaS gateway configuration examples. These examples are community-maintained and are not supported by Red Hat.
-
-
You have configured TLS for
Authorinoand the MaaS API gateway. For configuration instructions, see Configure TLS for Models-as-a-Service.
Deploying large language models might require additional dependencies based on the model size and serving runtime. For comprehensive model serving infrastructure requirements, see Component requirements.
MaaS configuration:
-
You have set
spec.components.kserve.modelsAsService.managementStatetoManagedin theDataScienceClustercustom resource.
Dashboard configuration:
-
You have set
spec.dashboardConfig.modelAsServicetotruein theOdhDashboardConfigcustom resource.
To access MaaS user-facing features in the dashboard:
-
You have set
spec.components.llamastackoperator.managementStatetoManagedin theDataScienceClustercustom resource. For more information, see Activating the Llama Stack Operator. -
You have set
spec.dashboardConfig.genAiStudiototruein theOdhDashboardConfigcustom resource.
To access MaaS administrative features in the dashboard:
-
You have set
spec.dashboardConfig.maasAuthPoliciestotruein theOdhDashboardConfigcustom resource.
To enable the MaaS observability dashboard for usage monitoring (optional):
-
You have set
spec.dashboardConfig.observabilityDashboardtotruein theOdhDashboardConfigcustom resource. For more information, see Managing observability. -
You have configured observability for OpenShift AI, including metrics storage in the
DSCInitializationresource and the Cluster Observability Operator. For complete setup instructions, see Managing observability. -
You have enabled observability in the
Kuadrantcustom resource. For more information, see Enable Kuadrant observability for Models-as-a-Service. -
You have enabled telemetry in the
Tenantcustom resource. For more information, see Enable telemetry for Models-as-a-Service.
The MaaS observability dashboard is a Technology Preview feature in Red Hat OpenShift AI. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
1.3. Configure the database secret for Models-as-a-Service リンクのコピーリンクがクリップボードにコピーされました!
You must create a secret that contains your PostgreSQL database connection details. This database is required for API key lifecycle management in Models-as-a-Service.
Prerequisites
- You have deployed a PostgreSQL database instance, version 14 or later, that is reachable from the OpenShift cluster network. OpenShift AI does not provide a PostgreSQL database. You must provision and manage your own PostgreSQL instance before proceeding.
-
You have access to OpenShift CLI (
oc). -
You have permissions to create secrets in the
redhat-ods-applicationsnamespace.
Procedure
Create the
maas-db-configsecret in theredhat-ods-applicationsnamespace:$ oc create secret generic maas-db-config \ -n redhat-ods-applications \ --from-literal=DB_CONNECTION_URL=postgresql://<username>:<password>@<hostname>:<port>/<database>?sslmode=requirewhere:
<username>- Specifies the PostgreSQL database username.
<password>- Specifies the PostgreSQL database password.
<hostname>- Specifies the hostname or IP address of the PostgreSQL server.
<port>-
Specifies the port number for the PostgreSQL server, typically
5432. <database>Specifies the name of the PostgreSQL database.
The following example shows a complete connection string:
postgresql://maasadmin:XXXXX@pg.example.com:5432/maasdb?sslmode=require
Optional: Restart the
maas-apideployment to apply the configuration ifmodelsAsServiceis already set toManagedin theDataScienceClusterresource:$ oc rollout restart deployment/maas-api -n redhat-ods-applicationsThis step is not required if the secret exists before you enable
modelsAsServicein theDataScienceClusterresource.
Verification
Verify that the
maas-db-configsecret exists in theredhat-ods-applicationsnamespace:$ oc get secret maas-db-config -n redhat-ods-applicationsExpected output:
NAME TYPE DATA AGE maas-db-config Opaque 1 5s
1.4. Configure TLS for Models-as-a-Service リンクのコピーリンクがクリップボードにコピーされました!
To enable secure authentication and authorization for model endpoints, you must configure TLS communication between Authorino and the Models-as-a-Service (MaaS) API service.
Prerequisites
-
You have installed the Red Hat Connectivity Link Operator and created a
Kuadrantcustom resource. -
You have access to OpenShift CLI (
oc). - You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
Procedure
Annotate the
Authorinoservice to enable service serving certificate generation in OpenShift:$ oc annotate service authorino-authorino-authorization \ -n kuadrant-system \ service.beta.openshift.io/serving-cert-secret-name=authorino-server-cert \ --overwriteThe service-ca-operator generates a TLS certificate signed by the cluster service CA and stores it in the
authorino-server-certsecret.Patch the
Authorinocustom resource to enable the TLS listener:$ oc patch authorino authorino -n kuadrant-system --type=merge --patch ' { "spec": { "listener": { "tls": { "enabled": true, "certSecretRef": { "name": "authorino-server-cert" } } } } }'Authorinouses the generated certificate for inbound TLS communication.Configure the
Authorinodeployment with environment variables for TLS certificate validation:$ oc -n kuadrant-system set env deployment/authorino \ SSL_CERT_FILE=/etc/ssl/certs/openshift-service-ca/service-ca-bundle.crt \ REQUESTS_CA_BUNDLE=/etc/ssl/certs/openshift-service-ca/service-ca-bundle.crtThe cluster CA bundle is automatically populated by the service-ca-operator in OpenShift.
Annotate your
Gatewayresource to enable automatic TLS configuration:$ oc annotate gateway maas-default-gateway \ -n openshift-ingress \ security.opendatahub.io/authorino-tls-bootstrap="true" \ --overwriteThe MaaS controller detects this annotation and creates an
EnvoyFilterresource that configures the Envoy proxy to use TLS when communicating withAuthorino.
Verification
Verify that the
Authorinoservice has the serving certificate annotation:$ oc get service authorino-authorino-authorization -n kuadrant-system -o jsonpath='{.metadata.annotations.service.beta.openshift.io/serving-cert-secret-name}'Expected output:
authorino-server-certVerify that the
authorino-server-certsecret exists:$ oc get secret authorino-server-cert -n kuadrant-systemExpected output:
NAME TYPE DATA AGE authorino-server-cert kubernetes.io/tls 2 5mVerify that the
AuthorinoCR has TLS enabled:$ oc get authorino authorino -n kuadrant-system -o jsonpath='{.spec.listener.tls.enabled}'Expected output:
trueVerify that the
Authorinodeployment has the TLS certificate environment variables configured:$ oc get deployment/authorino -n kuadrant-system -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="SSL_CERT_FILE")].value}'Expected output:
/etc/ssl/certs/openshift-service-ca/service-ca-bundle.crtVerify that the
Gatewayhas the TLS bootstrap annotation:$ oc get gateway maas-default-gateway -n openshift-ingress -o jsonpath='{.metadata.annotations.security.opendatahub.io/authorino-tls-bootstrap}'Expected output:
true
1.5. Verify Models-as-a-Service deployment リンクのコピーリンクがクリップボードにコピーされました!
After you deploy Models-as-a-Service, you can run a series of checks to confirm that the required custom resources, monitoring components, and tenant configuration are in place.
Prerequisites
- You have deployed Models-as-a-Service.
-
You have access to OpenShift CLI (
oc). - You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
Procedure
Verify that the Models-as-a-Service (MaaS) custom resource definitions (CRDs) are installed:
$ oc get crd | grep maas.opendatahub.ioExpected output shows the following CRDs:
maasauthpolicies.maas.opendatahub.io maasmodelrefs.maas.opendatahub.io maassubscriptions.maas.opendatahub.io externalmodels.maas.opendatahub.io tenants.maas.opendatahub.ioVerify that User Workload Monitoring is enabled on the cluster:
$ oc get configmap cluster-monitoring-config -n openshift-monitoring -o jsonpath='{.data.config.yaml}' | grep enableUserWorkloadExpected output:
enableUserWorkload: trueIf User Workload Monitoring is not enabled, the MaaS deployment might show as
Degraded. For information about enabling User Workload Monitoring, see Enabling monitoring for user-defined projects.Verify that the
Tenantcustom resource exists in themodels-as-a-servicenamespace:$ oc get tenants.maas.opendatahub.io -n models-as-a-serviceExpected output shows at least one
Tenantresource:NAME AGE default-tenant 5mCheck the status of the
Tenantcustom resource:$ oc get tenants.maas.opendatahub.io default-tenant -n models-as-a-service -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'Expected output:
TrueThe following values indicate the deployment status:
True- Indicates that the MaaS deployment is successful and all prerequisites are met.
FalseorDegradedIndicates missing prerequisites or configuration issues. Check the condition message for details:
$ oc get tenants.maas.opendatahub.io default-tenant -n models-as-a-service -o jsonpath='{.status.conditions[?(@.type=="Ready")].message}'
Verify that the
Tenantcustom resource is configured:$ oc get tenants.maas.opendatahub.io default-tenant -n models-as-a-serviceExpected output:
NAME READY REASON default-tenant True AllComponentsReadyOptional: If you plan to use external models, verify that the
ExternalModelCRD is available:$ oc get crd externalmodels.maas.opendatahub.ioExpected output shows the CRD details:
NAME CREATED AT externalmodels.maas.opendatahub.io 2026-04-28T10:15:30Z
Verification
- All MaaS CRDs are deployed and available.
- User Workload Monitoring is enabled on the cluster.
-
The
Tenantcustom resource shows aReadystatus with reasonAllComponentsReady.
Troubleshooting
If the Tenant status shows False or Degraded:
- Verify that User Workload Monitoring is enabled on the cluster.
- Verify that all prerequisites in Prerequisites for Models-as-a-Service are met.
-
Check that the PostgreSQL database secret
maas-db-configexists in theredhat-ods-applicationsnamespace. -
Verify that Red Hat Connectivity Link is installed and the
Kuadrantcustom resource is ready. -
Review the
Tenantcondition messages for specific error details.
1.6. Publish models with Models-as-a-Service リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can deploy generative AI models through the dashboard wizard and publish them to Models-as-a-Service (MaaS) so that administrators can enforce subscription-based quota and token limits.
Prerequisites
- You are logged in to the OpenShift AI dashboard.
- You have administrator access to a project in OpenShift AI.
- Your cluster administrator has installed the required operators and infrastructure for Models-as-a-Service. For more information, see Prerequisites for Models-as-a-Service.
- If you plan to use distributed inference with llm-d, your cluster administrator has completed the setup steps in Enabling distributed inference and configured authentication as described in Enabling authentication and authorization for LLM Inference Service.
-
If you plan to use vLLM runtime with Models-as-a-Service, your cluster administrator has enabled the vLLM deployment on MaaS feature flag:
spec.dashboardConfig.vLLMDeploymentOnMaaS: truein theOdhDashboardConfigcustom resource. vLLM runtime support for Models-as-a-Service is a Technology Preview feature.
Procedure
- In the left navigation menu, click Projects.
- Click the name of the project where you want to deploy the model.
- Click the Deployments tab.
- Click Deploy model to open the wizard.
- In the Model details section, specify your storage connection and model path.
Complete the model configuration based on whether
vLLMDeploymentOnMaaSis enabled in theOdhDashboardConfigcustom resource:- If
vLLMDeploymentOnMaaSis enabled - In the Model details section, select Generative AI model (Example, LLM) as the model type.
- Make sure that the Use legacy deployment method checkbox is unchecked.
- Click Next.
- In the Model deployment section, enter a unique model deployment name using lowercase letters, numbers, and hyphens.
- Select an appropriate hardware profile for your model.
Select one of the following deployment resources:
- Distributed inference with llm-d for distributed inference support.
-
A vLLM-based
LLMInferenceServiceConfig, such as vLLM NVIDIA CUDA GPU LLMInferenceServiceConfig, for vLLM-based serving as a Technology Preview feature in Red Hat OpenShift AI.
- If
vLLMDeploymentOnMaaSis not enabled - In the Model details section, select Generative AI model (Example, LLM) as the model type.
- Click Next.
- In the Model deployment section, enter a unique model deployment name using lowercase letters, numbers, and hyphens.
- Select an appropriate hardware profile for your model.
- Select Distributed inference with llm-d as the deployment resource.
NoteModels deployed for MaaS use the
LLMInferenceServicearchitecture, which is designed for large language models and integrates with the MaaS gateway for subscription-based quota enforcement. The legacy deployment method uses traditional KServeInferenceServiceresources with serving runtimes.- If
- In the Model deployment section, in the Number of replicas field, enter the number of replicas to deploy. The default is 1. For production workloads, consider deploying at least 2 replicas for high availability.
- Click Next.
In the Advanced settings section, configure MaaS publishing:
Under Model availability, select Publish as MaaS to make the model accessible to users through the MaaS gateway.
NotePublishing as MaaS creates a
MaaSModelRefobject that registers the model with MaaS for subscription assignment. After publishing, an administrator must create a subscription and add this model to make it accessible to user groups.- Optional: Select Add custom runtime arguments or Add custom runtime environment variables to customize model behavior.
- Click Next.
In the Review section, verify your configuration settings:
- Review the model details, deployment configuration, and advanced settings.
- Click Deploy model.
Verification
- Verify that the model appears on the Deployments tab with a checkmark in the Status column.
Verify that the model was published to MaaS:
$ oc get maasmodelref -n <your-project-namespace>You should see a
MaaSModelRefobject for your deployed model.
Models published to MaaS require subscription and authorization policy configuration before users can access them.
Additional resources
1.7. Models-as-a-Service subscriptions リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can use Models-as-a-Service (MaaS) subscriptions to manage quotas and token limits for AI model serving. With subscriptions, you can grant specific groups quotas for models with configurable token limits based on user group membership.
In OpenShift AI 3.3, MaaS used a tier-based model for access control. Starting with OpenShift AI 3.4, tiers have been replaced with subscriptions. The subscription model provides more flexibility by allowing users to belong to multiple subscriptions and uses custom resource definition (CRD)-based configuration for improved GitOps compatibility.
1.7.1. Subscription-based access control リンクのコピーリンクがクリップボードにコピーされました!
When multiple teams share large language models, you can use subscriptions to perform the following tasks:
- Prevent resource exhaustion by enforcing token limits per model
- Provide different access levels for different user groups
- Track and allocate costs based on team consumption
- Control which teams can access high-cost or sensitive models
- Allow users to belong to multiple subscriptions based on their group memberships
MaaS assigns users to subscriptions based on their OpenShift group membership. When a user belongs to multiple groups with different subscriptions, the system uses the subscription with the highest priority level.
1.7.2. Subscription properties リンクのコピーリンクがクリップボードにコピーされました!
MaaS subscriptions are defined as MaaSSubscription custom resources in the cluster. Each subscription has the following properties:
- Name
- A unique identifier for the subscription that becomes the Kubernetes resource name.
- Description
- An optional human-readable description shown in the dashboard.
- Groups
- One or more groups whose members can access this subscription. Groups can come from OpenShift Group objects or external OIDC providers. Users can belong to multiple groups and therefore have access to multiple subscriptions.
- Priority level
- A numeric value that determines subscription precedence when creating an API key without specifying a subscription. Higher numbers indicate higher priority, with 0 as the lowest. Priority only applies during API key creation.
- Models
- A list of models that this subscription gives quota for, with configurable token limits for each model.
1.7.3. Priority levels リンクのコピーリンクがクリップボードにコピーされました!
Priority levels determine which subscription is selected when creating an API key without explicitly specifying a subscription. When a user belongs to multiple groups with different subscriptions, the subscription with the highest priority is selected as the default.
For example, if a user belongs to both the analytics-team group with priority 1 and the production-apps group with priority 2, creating a key without specifying a subscription selects the production-apps subscription because it has the higher priority.
When creating API keys, specifying the subscription explicitly bypasses priority selection.
1.7.3.1. Priority level recommendations リンクのコピーリンクがクリップボードにコピーされました!
Use a consistent priority numbering scheme to make subscription precedence clear and maintainable.
Recommended priority scheme:
- Production workloads: 100
- Use this priority for customer-facing applications, production APIs, and critical business processes.
- Staging and pre-production: 50
- Use this priority for QA testing, user acceptance testing, and performance testing.
- Development and experimentation: 0
- Use this priority for exploratory data science work, prototype development, and learning. This is the default value.
- Personal and sandbox: -10
- Use this priority for individual experimentation, tutorials, and non-business use.
Common use cases:
- Separate production and development resources
- Create a production subscription with priority 100 for stricter quotas billed to production cost centers, and a development subscription with priority 0 for generous quotas billed to R&D. Production applications automatically use the production subscription, while developers must explicitly select the development subscription for testing.
- Team-based access with overlapping membership
- When users belong to multiple teams, assign higher priority to broader access. For example, set "ML Platform Team" to priority 10 for access to all models and "Analytics Team" to priority 5 for analytics-focused models. Users in both teams default to the ML Platform Team subscription.
- Cost-tiered model access
- Create a "Standard Models" subscription with priority 10 for cheaper models with higher quotas, and a "Premium Models" subscription with priority 0 for expensive models with limited quotas. Users consume cheaper resources by default and must explicitly select the premium subscription when needed.
Configuration guidance:
- Use incremental priority values with reasonable gaps such as 10, 20, 30 rather than 0, 1000, 10000. This makes it easier to insert intermediate priorities later.
- Avoid setting all subscriptions to the same priority, which creates unpredictable behavior.
- Use priority for convenience and defaults, not for access control. Use authorization policies to restrict which users can access which models.
1.7.4. Token limits リンクのコピーリンクがクリップボードにコピーされました!
Tokens are the basic units of text processing in large language models. Token limits control the maximum number of tokens that can be consumed per request or time period for a specific model.
Configure token limits for each model when you create or edit a subscription through the dashboard.
Each model in a subscription can have different token limit configurations, allowing administrators to provide varying levels of access to different models within the same subscription.
1.7.5. Relationship with authorization policies リンクのコピーリンクがクリップボードにコピーされました!
Subscriptions and authorization policies work together to control model access in the following ways:
- Subscriptions give groups quota for specific models with token rate limits.
- Authorization policies grant groups access to model endpoints through the API gateway.
Both are required for users to access models through MaaS. A subscription defines quota limits for model usage, while an authorization policy enables API gateway access.
When you create a subscription, you can optionally create a matching authorization policy by selecting the Create matching authorization policy checkbox. The authorization policy uses the same groups and models as the subscription, so users can access the models as soon as the subscription is created.
For more information about authorization policies, see Manage Models-as-a-Service authorization policies.
1.8. Manage Models-as-a-Service subscriptions リンクのコピーリンクがクリップボードにコピーされました!
You can create and manage Models-as-a-Service (MaaS) subscriptions to control group access to models and configure token limits.
In OpenShift AI 3.4, MaaS uses subscriptions instead of tiers.
1.8.1. View subscriptions リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can use the Subscriptions page to monitor service subscriptions, verify their status, and review associated models and priority levels.
Prerequisites
- You are logged in to the OpenShift AI dashboard.
- You have administrator access to the OpenShift AI dashboard.
Procedure
-
In the OpenShift AI dashboard, click Settings
Subscriptions. Review the information in the Subscriptions table:
- Name
- The unique identifier for the subscription.
- Phase
-
The current status of the subscription. Possible values:
Active,Failed. - Groups
- The number of OCP or custom user groups assigned to the subscription.
- Models
- The count of Models-as-a-Service (MaaS) model references included in the subscription.
- Priority
- The priority level assigned to the subscription. Higher numbers indicate higher priority.
Optional: To filter and organize the view:
- Use the Keyword dropdown to select a filter criterion.
- Enter text in the Filter by name or description field to search for specific subscriptions.
- Click the column headers to sort by Name, Phase, Groups, Models, or Priority.
- Click a subscription name to open the details pane.
- In the details view, review the subscription configuration including groups, models, token limits, and synchronization status.
Verification
- Verify that the Subscriptions page displays all configured subscriptions in the system.
-
Verify that subscriptions show the correct phase status:
ActiveorFailed. - Click a subscription name and verify that you can view its complete configuration.
1.8.2. Create a subscription for Models-as-a-Service リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can create a Models-as-a-Service (MaaS) subscription to grant user groups quota for specific models with configurable token limits.
Prerequisites
- You are logged in to the OpenShift AI dashboard.
- You have administrator access to the OpenShift AI dashboard.
- You have published at least one model to Models-as-a-Service (MaaS).
Procedure
-
In the OpenShift AI dashboard, click Settings
Subscriptions. - Click Create subscription.
In the Create subscription form, configure the subscription:
- Name: Enter a descriptive name for the subscription.
- Optional: Click Edit resource name to set a custom internal identifier for the subscription. If not specified, the resource name is generated automatically from the display name.
- Optional: Description: Provide a brief description of the purpose of the subscription.
- Priority: Set the subscription priority level using a numeric value. Higher numbers indicate higher priority. When a user belongs to multiple groups with different subscriptions, the subscription with the highest priority level is used.
- Groups: Select OpenShift groups or enter custom group names from your external OIDC provider. Users who are members of these groups can access the models included in this subscription.
Configure models and token limits:
- Click Add models.
- In the Add models dialog, review the available models and their associated projects, model IDs, existing subscriptions, and policies.
- Click Add model for each model you want to include in this subscription.
- Click Add models to confirm your selection.
- For each model in the subscription, click Add token limit to configure token consumption limits.
- In the Edit subscription token limits dialog, enter the number of tokens allowed.
- Enter the time period value.
- Select the time unit: hour, minute, or second.
- Optional: Click Add token rate limit to configure additional token limits with different time windows.
Click Save.
NoteAt least one token limit is required for each model in the subscription.
Optional: Select Create a matching authorization policy to automatically create an authorization policy for this subscription.
NoteTo consume model endpoints through the API gateway, users must have both a subscription and an authorization policy. A subscription defines quota for models. An authorization policy is a separate resource that authorizes specific groups to access model endpoints through the API gateway. While you can automatically create a matching authorization policy during subscription creation, creating authorization policies separately provides more flexibility and control.
ImportantThe matching authorization policy feature creates an initial authorization policy with the same groups and models as the subscription. However, changes made to the subscription later such as adding or removing groups or models require manual updates to the authorization policy. When you modify a subscription, you must manually update the corresponding authorization policy to keep them synchronized.
- Click Create subscription.
Verification
-
In the OpenShift AI dashboard, navigate to Settings
Subscriptions and verify that the new subscription appears in the list with a phase status of Active. - Click the subscription name to view its details and confirm the groups, models, token limits, and priority level are configured correctly.
Next steps
- If you did not select Create a matching authorization policy, Create an authorization policy
- Create an API key for a user
1.8.3. Edit a subscription リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can edit a Models-as-a-Service (MaaS) subscription to add or remove models, change token limits, or update group assignments. Changes modify the MaaSSubscription custom resource.
Prerequisites
- You are logged in to the OpenShift AI dashboard.
- You have administrator access to the OpenShift AI dashboard.
- At least one MaaS subscription exists.
Editing a MaaSSubscription resource takes effect when you save the changes:
- Changing token limits affects all users in the subscription groups.
- Adding or removing groups grants or revokes subscription quota for those groups.
- Adding or removing models changes which models are available to users in the subscription groups.
Procedure
-
In the OpenShift AI dashboard, click Settings
Subscriptions. - Locate the subscription you want to modify and click its name.
- Click the action menu (⋮), and then select Edit.
Modify the subscription properties:
- Update the name or description if needed.
- Adjust the priority level.
- Add or remove groups.
- Add or remove models by clicking Add models or using the action menu (⋮) next to each model.
Modify token limits for existing models by clicking Add token limit or editing existing limits.
ImportantThe
metadata.nameof theMaaSSubscriptioncustom resource cannot be changed after creation. If you need a different resource name, delete the subscription and create a new one.
Click Save.
ImportantIf you created a matching authorization policy when you created this subscription, changes to the subscription such as adding or removing groups or models require manual updates to the authorization policy. After modifying a subscription, manually update the corresponding authorization policy to keep them synchronized.
Verification
-
In the OpenShift AI dashboard, navigate to Settings
Subscriptions. - Verify that the subscription shows the updated configuration.
Optional: Check the
MaaSSubscriptioncustom resource to confirm the changes:$ oc get maassubscription <subscription-name> -n models-as-a-service -o yaml
1.8.4. Delete a subscription リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can delete a Models-as-a-Service (MaaS) subscription to revoke quota for specific user groups. Deleting a subscription removes the quota limits for all groups included in that subscription.
Prerequisites
- You have access to the OpenShift AI dashboard with administrator privileges.
- You have created at least one MaaS subscription.
Deleting a subscription revokes quota for all groups included in that subscription. Make sure that affected users have access to models through other subscriptions before deletion to avoid service disruption.
Procedure
-
In the OpenShift AI dashboard, click Settings
Subscriptions. - Locate the subscription you want to delete and click its name.
- Click the action menu (⋮), and then select Delete.
In the Delete subscription dialog, review the impact of deletion:
- Users in groups assigned to this subscription lose access to the included models.
- All API keys bound to this subscription are invalidated.
- Authorization policies are not deleted automatically. After deletion, review and update any authorization policies that referenced this subscription’s groups or models.
- To confirm, type the subscription name and click Delete.
Verification
-
In the OpenShift AI dashboard, navigate to Settings
Subscriptions. - Verify that the deleted subscription no longer appears in the list.
Optional: List the remaining subscriptions to confirm the expected state:
$ oc get maassubscription -n models-as-a-service
Next steps
- View authorization policies to review and update policies that referenced groups or models in the deleted subscription.
Additional resources
1.9. Manage Models-as-a-Service authorization policies リンクのコピーリンクがクリップボードにコピーされました!
You can create and manage authorization policies to control which groups can access AI model endpoints through the API gateway.
1.9.1. Models-as-a-Service authorization policies リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can use Models-as-a-Service (MaaS) authorization policies in combination with subscriptions to control user access to model endpoints through the API gateway.
A subscription gives groups quota for specific models with token rate limits. An authorization policy is required to authorize groups to access model endpoints through the API gateway.
Both a subscription and an authorization policy are required for users to access models through Models-as-a-Service:
- Subscription: Defines quota for models with token rate limits.
- Authorization policy: Authorizes groups to access model endpoints through the API gateway.
Without an authorization policy, users receive 403 Forbidden errors even if they have a valid subscription.
An authorization policy consists of the following components:
- Name
- A unique identifier for the policy.
- Description
- An optional description explaining the purpose of the policy.
- Groups
- The groups authorized to access model endpoints through this policy. Groups can come from OpenShift Group objects or external OIDC providers depending on the authentication method.
- Models
- The model endpoints that authorized groups can access.
- Phase
-
The current status of the policy:
Active,Failed, orUnknown.
When you create a subscription, you can optionally create a matching authorization policy automatically by selecting the Create matching authorization policy checkbox. This creates an authorization policy with the same groups and models as the subscription, ensuring that users can immediately access the models included in their subscription.
Authorization policy lifecycle
- Creating policies: Create authorization policies manually or automatically when creating a subscription.
-
Active phase: When a policy is in
Activephase, the specified groups can access the configured model endpoints through the API gateway. -
Failed phase: If a policy enters
Failedphase, check the policy status conditions for error messages. - Updating policies: You can add or remove groups and models from existing policies. Changes take effect immediately.
- Deleting policies: Deleting an authorization policy immediately revokes API gateway access for all groups in that policy.
Authorization policies are managed as Models-as-a-Service (MaaS) custom resources in OpenShift. You can manage them through the OpenShift AI dashboard or using OpenShift CLI (oc).
1.9.2. View authorization policies リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can view all Models-as-a-Service (MaaS) authorization policies in the dashboard to see which groups can access specific model endpoints.
Prerequisites
- You have access to the OpenShift AI dashboard with administrator privileges.
- At least one MaaS authorization policy exists.
Procedure
In the OpenShift AI dashboard, click Settings
Authorization policies. The Authorization policies page displays a table with the following columns:
- Name: The name of the authorization policy. Policies created automatically from subscriptions include the subscription name in the policy name.
-
Phase: The current status of the policy. Possible values:
Active,Failed,Unknown. - Groups: The number of OpenShift groups authorized by this policy.
- Models: The number of model endpoints included in this policy.
Optional: Filter the list of authorization policies:
- To filter by keyword, click the Keyword dropdown and select a filter option.
- To search by name or description, enter text in the search field.
- Optional: Sort the table by clicking any column header.
To view details of a specific authorization policy:
- Click the action menu (⋮) for the policy you want to view.
Select View details.
The policy details page displays:
- Policy name, description, phase, and resource name
- Creation date
- Groups authorized by this policy
- Models included in this policy, along with the namespace each model is deployed in
Verification
- The Authorization policies table displays all policies with their current status.
- When you view a policy’s details, the Groups and Models sections show the configuration for that policy.
1.9.3. Create an authorization policy リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can create authorization policies to control which groups can access AI model endpoints through the API gateway.
Prerequisites
- You have access to the OpenShift AI dashboard with administrator privileges.
- You have published at least one model to Models-as-a-Service (MaaS).
Procedure
-
In the OpenShift AI dashboard, click Settings
Authorization policies. - Click Create authorization policy.
In the Create authorization policy dialog, configure the following settings:
In the Name field, enter a unique name for the authorization policy.
NoteBy default, the resource name matches the policy name. To customize the resource name, click Edit resource name and enter a different value. The resource name identifies the underlying
MaaSAuthPolicycustom resource in OpenShift.- Optional: In the Description field, enter a description explaining the purpose of the policy.
From the Groups dropdown, select the groups to authorize for API gateway access.
You can select multiple groups or type to add a new group name. Groups can come from OpenShift Group objects, API key group snapshots, or OIDC token claims depending on how users authenticate.
- In the Models section, click Add models.
- In the Add models to authorization policy dialog, browse the list of available models or use the search field to filter by name or description.
- Review the Subscriptions and Policies columns to see which subscriptions and policies already include each model.
- For each model you want to add, click Add model.
Click Add models to add the selected models to the policy.
The Models section displays a table with the selected models and their project namespaces.
- Click Create authorization policy.
If you create a subscription with the Create matching authorization policy option selected, an authorization policy is created automatically with the same groups and models as the subscription. You only need to create authorization policies manually when you want to configure API gateway access independently of subscriptions.
Verification
- The new authorization policy appears in the Authorization policies table with an Active phase.
Next steps
1.9.4. Edit an authorization policy リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can use the dashboard to edit Models-as-a-Service (MaaS) authorization policies to add or remove authorized groups and model endpoints.
Prerequisites
- You have access to the OpenShift AI dashboard with administrator privileges.
- At least one authorization policy exists.
Editing an authorization policy takes effect when you save your changes:
- Adding groups grants API gateway access to users in those groups.
- Removing groups revokes API gateway access for users in those groups, including users with active sessions.
- Users and applications might experience access changes or receive 403 Forbidden errors based on the new configuration.
Procedure
-
In the OpenShift AI dashboard, click Settings
Authorization policies. - In the row for the authorization policy you want to edit, click the action menu (⋮), and then select Edit.
In the Edit authorization policy dialog, modify the following settings as needed:
- Update the Name field to change the policy display name.
- Update the Description field to change the policy description.
From the Groups dropdown, add or remove groups:
- To add groups, select additional groups from the dropdown or type to add a new group name.
To remove a group, click the remove icon (×) next to the group name.
NoteThe available groups depend on your authentication configuration: OpenShift groups when using OpenShift authentication, or OIDC group claims when using external OIDC.
In the Models section, add or remove models:
- To add models, click Add models to open the Add models to authorization policy dialog. Select the models to add, and then click Add models in the dialog to confirm.
- To remove a model, click the action menu (⋮) for the model, and then select Remove.
- Click Save to apply your changes.
Verification
- The updated authorization policy appears in the Authorization policies table. To confirm specific changes, click the policy name to open the detail view.
Confirm the updated configuration on the
MaaSAuthPolicyresource:$ oc get maasauthpolicy <policy-name> -n models-as-a-service -o yaml
1.9.5. Delete a Models-as-a-Service authorization policy リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can delete Models-as-a-Service (MaaS) authorization policies to revoke API gateway access for groups included in the policy. If another authorization policy grants the same groups access to the same models, those groups retain access through the remaining policy.
Prerequisites
- You have access to the OpenShift AI dashboard with administrator privileges.
-
At least one
MaaSAuthPolicyresource exists.
Procedure
-
In the OpenShift AI dashboard, click Settings
Authorization policies. - In the row for the authorization policy you want to delete, click the action menu (⋮), and then select Delete.
- In the Delete policy dialog, enter the authorization policy name to confirm deletion.
- Click Delete.
Deleting an authorization policy revokes API gateway access for all groups in that policy. Users and applications using models covered by this policy receive 403 Forbidden errors even if they have a valid subscription.
If you delete an authorization policy that was automatically created with a subscription, the subscription remains active and continues to enforce token limits. API gateway access requires a new authorization policy.
When removing a user from a group, you must manually revoke all associated API keys to immediately revoke access. Consider setting up automation to revoke API keys when users are removed from groups.
Verification
- The deleted authorization policy no longer appears in the Authorization policies table.
Confirm the
MaaSAuthPolicyresource was deleted:$ oc get maasauthpolicy <policy-name> -n models-as-a-serviceExpected output:
Error from server (NotFound): maasauthpolicies.maas.opendatahub.io "<policy-name>" not found
Next steps
Additional resources
1.10. Manage API keys for users リンクのコピーリンクがクリップボードにコピーされました!
You can create and manage API keys on behalf of users to provide them with programmatic access to models through Models-as-a-Service subscriptions.
1.10.1. View API keys リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can view and filter all API keys created by users in your organization to monitor key usage and identify keys that might need to be revoked.
Prerequisites
- You have access to the OpenShift AI dashboard with administrator privileges.
Procedure
In the OpenShift AI dashboard, click Gen AI studio
API keys. The API keys page displays a table with the following columns:
- Name: The name assigned to the API key
-
Status: The current state of the key. Possible values:
Active,Expired,Revoked. - Subscription: The subscription that scopes the key’s access to models
- Owner: The username of the key owner
- Created: The date when the key was created
- Last used: The date when the key was last used to access a model
- Expires: The expiration date for the key
Optional: Filter the list of API keys:
- To filter by status, click the Status dropdown and select Active, Expired, or Revoked.
- To filter by username, enter a username in the search field.
- Optional: Sort the table by clicking any column header.
- Optional: If the list contains more keys than fit on one page, use the pagination controls at the bottom of the table to navigate between pages.
Verification
-
The API keys table lists at least one key with a Status value of
Active,Expired, orRevoked. - Each key displays information including owner, subscription, creation date, last used date, and expiration.
1.10.2. Create an API key for a user リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can create API keys on behalf of users to provide them with programmatic access to models through Models-as-a-Service subscriptions.
Prerequisites
- You have access to the OpenShift AI dashboard with administrator privileges.
- You have created at least one Models-as-a-Service (MaaS) subscription.
Procedure
-
In the OpenShift AI dashboard, click Gen AI studio
API keys. - Click Create API key.
In the Create API key dialog, configure the following settings:
- In the Name field, enter a descriptive name for the API key.
- Optional: In the Description field, enter additional details about what the key is for.
From the Subscription dropdown, select the subscription that determines which models the key can access and the applicable token limits.
The Models section displays the models included in the selected subscription and their configured token limits.
From the Expiration dropdown, select the number of days until the key expires.
NoteYou can select an expiration period from 1 to 365 days. The default expiration is 30 days. As an administrator, you can set a maximum expiration limit in the
Tenantcustom resource. If not set, the default maximum is 90 days.
Click Create.
The API key created dialog displays the generated key with a prefix of
sk-oai-.ImportantThe plaintext key is displayed only during creation and cannot be retrieved later. Save the key in a secrets manager before closing the API key created dialog. If you lose the key, you must revoke it and create a new one.
- Click the copy icon next to the API key field to copy the key, and then provide it to the user through a secure channel.
- Click Close.
Verification
- The new API key appears in the API keys table with an Active status.
Next steps
1.10.3. Revoke user API keys リンクのコピーリンクがクリップボードにコピーされました!
As an administrator, you can revoke individual Models-as-a-Service (MaaS) API keys, or all the keys belonging to a specific user.
API keys retain a snapshot of user group memberships from when the key was created. If a user is removed from a group, their existing API keys continue to grant access until you revoke them. To immediately revoke access after a group change, revoke the API keys for that user.
Prerequisites
- You have administrator privileges for OpenShift AI.
- You have access to the OpenShift AI dashboard with administrator privileges.
- At least one API key exists.
Procedure
To revoke an individual API key:
-
In the OpenShift AI dashboard, click Gen AI studio
API keys. - In the row for the API key you want to revoke, click the action menu (⋮) and select Revoke.
- In the Revoke API key? dialog, enter the API key name to confirm revocation.
- Click Revoke.
To revoke all API keys for a single user:
-
In the OpenShift AI dashboard, click Gen AI studio
API keys. - Click the action menu (⋮) in the table header and select Revoke user API keys.
- In the Revoke user API keys dialog, enter the username in the Username field.
- Click the search icon to display the keys for that user.
- Click Revoke all keys to revoke all API keys for the user.
Verification
- Verify that the revoked API key or keys display a Revoked status in the API keys table.
Verify that the revoked API key cannot make API calls:
$ curl -H "Authorization: Bearer <revoked-api-key>" \ https://<maas-gateway-url>/maas-api/v1/modelsExpected output:
401 Unauthorizederror indicating the key is no longer valid.
1.10.4. Configure the API key expiration limit リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can set a maximum expiration period for API keys to enforce security policies and prevent users from creating keys with extended expiration periods.
Prerequisites
- You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
- You have deployed Models-as-a-Service.
-
The
Tenantcustom resource exists in themodels-as-a-servicenamespace.
Changes to maxExpirationDays apply only to API keys created after the change. Existing keys retain their original expiration dates.
Procedure
Edit the
Tenantcustom resource to configure the API key expiration limit, using one of the following methods:- Using the OpenShift console
-
In the OpenShift console, navigate to Administration
CustomResourceDefinitions. -
Search for
Tenantand click the resource name. - Click the Instances tab.
- Click default-tenant.
- Click the YAML tab.
In the YAML editor, locate the
specsection and add or update theapiKeysconfiguration:spec: apiKeys: maxExpirationDays: 90The YAML file uses the following field:
-
In the OpenShift console, navigate to Administration
maxExpirationDaysSpecifies the maximum allowed expiration in days for API keys. Common values: 30 for one month, 90 for three months, or 365 for one year. If not set, the default is 90 days.
- Click Save.
- Using the OpenShift CLI
Run the following command:
$ oc patch tenants.maas.opendatahub.io default-tenant -n models-as-a-service \ --type merge \ -p '{"spec":{"apiKeys":{"maxExpirationDays":90}}}'
Verification
Confirm that the
maxExpirationDaysvalue is set on theTenantresource:$ oc get tenants.maas.opendatahub.io default-tenant -n models-as-a-service -o jsonpath='{.spec.apiKeys.maxExpirationDays}'The output is the value you configured, for example,
90.- Optional: Test that the limit is enforced by attempting to create an API key with an expiration period that exceeds the configured maximum. The request receives a 400 Bad Request response with an error message indicating the expiration exceeds the configured maximum.
Additional resources
1.11. Manage Models-as-a-Service using the CLI and API リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can manage Models-as-a-Service (MaaS) configurations by using the MaaS API and custom resources. This approach is useful for the following scenarios:
- External OpenID Connect (OIDC) users who cannot access the dashboard
- Automating repetitive configuration tasks
- Integrating MaaS management into GitOps workflows and CI/CD pipelines
1.11.1. Models-as-a-Service API overview リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can use the Models-as-a-Service (MaaS) API to manage subscriptions, authorization policies, and API keys, and to call model endpoints programmatically. You can access the API directly through HTTP clients such as curl or integrate it with automation tools and GitOps workflows.
1.11.1.1. API structure リンクのコピーリンクがクリップボードにコピーされました!
MaaS provides two APIs:
- Management API
-
The MaaS management API (
/maas-api/v1) provides endpoints for management operations such as creating API keys, listing models, and managing user access. This API accepts both OIDC tokens and API keys for authentication. - Inference API
-
Model-specific inference endpoints (
/llm/<model-name>/v1) use OpenAI-compatible request and response formats for model interactions. These endpoints require API key authentication.
1.11.1.2. Authentication methods リンクのコピーリンクがクリップボードにコピーされました!
The MaaS API supports two authentication methods:
- OIDC tokens
- Users authenticated through an external OIDC provider authenticate with JWTs obtained from their identity provider. OIDC tokens are required for management API operations that use external OIDC authentication.
- API keys
-
Users authenticate all management API operations and all inference operations by using API keys with the
sk-oai-prefix. API keys can be created through the MaaS API or, for OpenShift-authenticated users, through the dashboard.
Additional resources
1.11.2. MaaS custom resource workflow リンクのコピーリンクがクリップボードにコピーされました!
To make a model available to your users through Models-as-a-Service (MaaS), you can create and apply custom resources by using YAML and OpenShift CLI (oc) in the following order:
-
Publish a model by creating a
MaaSModelRefresource that references your deployed inference server. -
Create a subscription by defining a
MaaSSubscriptionresource that assigns groups or users to the published model with token quota. -
Create an authorization policy by defining a
MaaSAuthPolicyresource that grants API gateway access to the same groups or users.
For a complete list of MaaS custom resources, see Models-as-a-Service custom resources.
1.11.3. Publish a model to Models-as-a-Service using YAML リンクのコピーリンクがクリップボードにコピーされました!
You can publish a deployed model to Models-as-a-Service (MaaS) by creating a MaaSModelRef custom resource. After you publish the model, administrators can add it to MaaS subscriptions and authorization policies.
Prerequisites
- You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
-
You have access to OpenShift CLI (
oc). - You have deployed Models-as-a-Service.
You have deployed a model using one of the supported serving methods:
- LLM-D distributed inference
- vLLM runtime
- External LLM provider
Procedure
Create a YAML file named, for example,
maasmodelref.yaml, defining theMaaSModelRefcustom resource:apiVersion: models.opendatahub.io/v1alpha1 kind: MaaSModelRef metadata: name: <model-ref-name> namespace: <model-namespace> spec: modelRef: kind: <backend-kind> name: <backend-resource-name>where:
metadata.name- Specifies a unique resource name for the model reference.
metadata.namespace- Specifies the namespace where the model reference is created. This must be the same namespace as the backend resource.
spec.modelRef.kind-
Specifies the backend resource type. Allowed values:
LLMInferenceService,ExternalModel. spec.modelRef.nameSpecifies the name of the backend resource (for example, an
LLMInferenceServiceorExternalModelresource).NoteFor models served using the vLLM runtime or LLM-D distributed inference, use
kind: LLMInferenceService. For models hosted by external providers such as OpenAI or Anthropic, usekind: ExternalModel.Example configuration for vLLM model
apiVersion: models.opendatahub.io/v1alpha1 kind: MaaSModelRef metadata: name: llama-3-8b-instruct namespace: model-serving spec: modelRef: kind: LLMInferenceService name: llama-3-8b-instructExample configuration for external model
apiVersion: models.opendatahub.io/v1alpha1 kind: MaaSModelRef metadata: name: gpt-4-turbo namespace: external-models spec: modelRef: kind: ExternalModel name: gpt-4-turbo
Optional: If you need to override the auto-discovered endpoint URL, add the
endpointOverridefield:apiVersion: models.opendatahub.io/v1alpha1 kind: MaaSModelRef metadata: name: <model-ref-name> namespace: <model-namespace> spec: modelRef: kind: <backend-kind> name: <backend-resource-name> endpointOverride: <custom-endpoint-url>Apply the YAML file:
$ oc apply -f maasmodelref.yaml
Verification
Verify that the model reference was created:
$ oc get maasmodelref <model-ref-name> -n <model-namespace>Check the model reference status:
$ oc get maasmodelref <model-ref-name> -n <model-namespace> -o jsonpath={.status.phase}Expected output:
ReadyVerify that the model appears in the MaaS model list:
$ oc get maasmodelref -AVerify that the model is accessible through the MaaS gateway:
$ curl -H "Authorization: Bearer <api-key>" \ https://<maas-gateway-url>/maas-api/v1/models | jq .data[].idThe output includes the published model ID.
Publishing a model creates a reference for MaaS, but does not grant any user access. To make the model accessible to users, an administrator must include the model in a subscription and create a matching authorization policy.
Next steps
1.11.4. Create a subscription using YAML リンクのコピーリンクがクリップボードにコピーされました!
You can create a Models-as-a-Service (MaaS) subscription by defining a MaaSSubscription custom resource in YAML and applying it using OpenShift CLI (oc). This approach is useful for GitOps workflows and automation.
Prerequisites
- You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
-
You have access to OpenShift CLI (
oc). - You have deployed Models-as-a-Service.
- You have published at least one model to Models-as-a-Service (MaaS).
Procedure
Create a YAML file defining the
MaaSSubscriptioncustom resource:apiVersion: models.opendatahub.io/v1alpha1 kind: MaaSSubscription metadata: name: <subscription-name> namespace: models-as-a-service spec: owner: groups: - kind: Group name: <group-1> - kind: Group name: <group-2> users: - <user-1> - <user-2> modelRefs: - name: <model-name> namespace: <model-namespace> tokenRateLimits: - limit: <token-limit> window: "<time-window>" - limit: <token-limit> window: "<time-window>" priority: <priority-level>where:
metadata.name- Specifies the Kubernetes resource name for the subscription.
metadata.namespace-
Specifies the namespace where the subscription is created. This must be
models-as-a-service. spec.owner.groups-
Lists the groups whose members can use this subscription. Each group entry requires
kind: Groupand anamefield that references an OpenShift Group object or external OIDC group. spec.owner.users- Lists individual users who can use this subscription. Users must be valid Kubernetes user identities.
spec.modelRefs- Defines the models available in this subscription with their token rate limits.
spec.modelRefs.name-
Specifies the
MaaSModelRefresource name. spec.modelRefs.namespace-
Specifies the namespace containing the
MaaSModelRefresource. spec.modelRefs.tokenRateLimits- Configures token consumption limits with time windows. At least one rate limit is required per model.
spec.modelRefs.tokenRateLimits.limit- Specifies the maximum number of tokens allowed within the time window.
spec.modelRefs.tokenRateLimits.window-
Specifies the time window for the rate limit. Supported units:
s(seconds),m(minutes),h(hours). Pattern:1-9999followed by a unit (for example,30s,5m,1h,24h). spec.prioritySets the subscription priority level when a user has multiple subscriptions. Higher numbers indicate higher priority. Default:
0.NoteAt least one of
groupsorusersmust be specified underowner. Multiple token rate limits can be configured for each model to enforce different time windows (for example, per-minute and per-day limits).ImportantThe
windowfield no longer acceptsd(days) as a unit. Use hours instead. For example, use24hinstead of1d, or168hfor a week.Example configuration
apiVersion: models.opendatahub.io/v1alpha1 kind: MaaSSubscription metadata: name: data-science-team namespace: models-as-a-service spec: owner: groups: - kind: Group name: data-scientists - kind: Group name: ml-engineers users: - alice@example.com modelRefs: - name: llama-3-8b-instruct namespace: model-serving tokenRateLimits: - limit: 1000000 window: "1h" - limit: 5000000 window: "24h" - name: granite-7b-lab namespace: model-serving tokenRateLimits: - limit: 500000 window: "1h" priority: 10
Apply the YAML file:
$ oc apply -f <subscription-file>.yamlCreate a matching authorization policy:
$ oc apply -f - <<EOF apiVersion: models.opendatahub.io/v1alpha1 kind: MaaSAuthPolicy metadata: name: <policy-name> namespace: models-as-a-service spec: subjects: groups: - name: <group-1> - name: <group-2> users: - <user-1> - <user-2> modelRefs: - name: <model-name> namespace: <model-namespace> EOFAuthorization policies are required in addition to subscriptions to grant API gateway access.
Verification
Verify that the subscription was created:
$ oc get maassubscription <subscription-name> -n models-as-a-serviceCheck the subscription status:
$ oc get maassubscription <subscription-name> -n models-as-a-service -o jsonpath={.status.phase}Expected output:
Ready
Next steps
1.11.5. Create an authorization policy using YAML リンクのコピーリンクがクリップボードにコピーされました!
You can create a Models-as-a-Service (MaaS) authorization policy by defining a MaaSAuthPolicy custom resource in YAML and applying it using OpenShift CLI (oc). Authorization policies grant groups and users access to model endpoints through the API gateway. Subscriptions control token quota limits separately from gateway access.
Prerequisites
- You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
-
You have access to OpenShift CLI (
oc). - You have deployed Models-as-a-Service.
- At least one MaaS subscription exists for the groups and models that you plan to authorize.
Procedure
Create a YAML file defining the
MaaSAuthPolicycustom resource:apiVersion: models.opendatahub.io/v1alpha1 kind: MaaSAuthPolicy metadata: name: <policy-name> namespace: models-as-a-service spec: subjects: groups: - name: <group-1> - name: <group-2> users: - <user-1> - <user-2> modelRefs: - name: <model-name-1> namespace: <model-namespace> - name: <model-name-2> namespace: <model-namespace>where:
metadata.name- Specifies the Kubernetes resource name for the authorization policy.
metadata.namespace-
Specifies the namespace where the authorization policy is created. This must be
models-as-a-service. spec.subjects.groups-
Lists the groups authorized to access model endpoints. Each group entry requires a
namefield that references a Kubernetes Group object or external OIDC group. spec.subjects.users- Lists the individual users authorized to access model endpoints. Users must be valid Kubernetes user identities.
spec.modelRefs- Defines the model endpoints that authorized subjects can access.
spec.modelRefs.name-
Specifies the
MaaSModelRefresource name. spec.modelRefs.namespaceSpecifies the namespace containing the
MaaSModelRefresource.NoteThe authorization policy uses OR logic: any matching group or user grants access to all specified models. At least one of
groupsorusersmust be specified undersubjects.Example configuration
apiVersion: models.opendatahub.io/v1alpha1 kind: MaaSAuthPolicy metadata: name: ml-team-auth-policy namespace: models-as-a-service spec: subjects: groups: - name: ml-engineers - name: data-scientists users: - alice@example.com modelRefs: - name: llama-3-8b-instruct namespace: model-serving - name: granite-7b-lab namespace: model-serving
Optional: Add metering metadata for billing attribution and cost tracking:
apiVersion: models.opendatahub.io/v1alpha1 kind: MaaSAuthPolicy metadata: name: <policy-name> namespace: models-as-a-service spec: subjects: groups: - name: <group-name> modelRefs: - name: <model-name> namespace: <model-namespace> meteringMetadata: organizationId: <organization-id> costCenter: <cost-center> labels: <key>: <value>where:
meteringMetadata.organizationId- Specifies the organization identifier for billing purposes.
meteringMetadata.costCenter- Specifies the cost center for billing attribution.
meteringMetadata.labels- Provides additional key-value pairs for tracking and reporting.
Apply the YAML file:
$ oc apply -f <auth-policy-file>.yaml
Authorization policies and subscriptions are independent resources. If you modify a subscription to add or remove groups or models, you must manually update the corresponding authorization policy to keep them synchronized.
Verification
Verify that the authorization policy was created:
$ oc get maasauthpolicy <policy-name> -n models-as-a-serviceCheck the authorization policy status:
$ oc get maasauthpolicy <policy-name> -n models-as-a-service -o jsonpath={.status.phase}Expected output:
ActiveVerify that authorized users can access the models:
$ curl -H "Authorization: Bearer <api-key>" \ https://<maas-gateway-url>/maas-api/v1/models
1.11.6. Manage API keys using the Models-as-a-Service API リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, administrators and users can create, list, and revoke API keys by using the Models-as-a-Service (MaaS) API with curl commands. Administrators can manage API keys for all users, while users can manage their own API keys.
Prerequisites
- You have deployed Models-as-a-Service (MaaS) on your OpenShift cluster. For more information, see Prerequisites for MaaS.
-
You have access to OpenShift CLI (
oc). - You have published at least one model to MaaS.
- You have at least one subscription configured.
- You have an authorization policy that grants users or groups access to model endpoints.
Procedure
Obtain the MaaS gateway URL:
$ oc get gateway maas-default-gateway -n openshift-ingress -o jsonpath='{.status.addresses[0].value}'Obtain an authentication token:
- For OpenShift users
$ oc whoami -t- For external OIDC users
$ curl -X POST "<oidc_token_endpoint>" \ -d "client_id=<client_id>" \ -d "client_secret=<client_secret>" \ -d "grant_type=client_credentials"
List the available subscriptions:
$ curl -X GET https://<maas_gateway_url>/maas-api/v1/subscriptions \ -H "Authorization: Bearer <auth_token>"Note the subscription name you want to associate with the API key. Administrators can view all subscriptions, while regular users can view only subscriptions associated with their user or group.
To create an API key, send a POST request to the API keys endpoint:
$ curl -X POST https://<maas_gateway_url>/maas-api/v1/api-keys \ -H "Authorization: Bearer <auth_token>" \ -H "Content-Type: application/json" \ -d { "name": "<key_name>", "subscription": "<subscription_name>", "expiresInDays": <days> }where:
<maas_gateway_url>- Specifies your MaaS gateway URL.
<auth_token>- Specifies the authentication token obtained in step 2.
<key_name>- Specifies a descriptive name for the API key.
<subscription_name>- Specifies the subscription name that scopes the key’s access.
<days>-
Specifies the number of days until the key expires. The maximum expiration period is configured in the
Tenantcustom resource. If not set, the default maximum is 90 days.
ImportantThe plain text key is returned only in this response and cannot be retrieved later. Save the key in a secrets manager or other secure storage before sending further requests.
To list API keys, send a GET request to the API keys endpoint:
$ curl -X GET https://<maas_gateway_url>/maas-api/v1/api-keys \ -H "Authorization: Bearer <auth_token>"The API returns a list of API keys with their names, subscription associations, creation dates, and expiration dates.
To revoke an API key, send a DELETE request to the API key endpoint:
$ curl -X DELETE https://<maas_gateway_url>/maas-api/v1/api-keys/<key_id> \ -H "Authorization: Bearer <auth_token>"where:
<key_id>- Specifies the ID of the API key to revoke. You can obtain the key ID from the list API keys response.
API keys are scoped to subscriptions. If a subscription is deleted, all API keys created for that subscription are invalidated.
Verification
Verify that the API key appears in the list of keys:
$ curl -X GET https://<maas_gateway_url>/maas-api/v1/api-keys \ -H "Authorization: Bearer <auth_token>"Test the API key by making a request to list available models:
$ curl -H "Authorization: Bearer <api_key>" \ https://<maas_gateway_url>/maas-api/v1/models
Additional resources
1.11.7. Models-as-a-Service API endpoints reference リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, the Models-as-a-Service (MaaS) API provides programmatic access to model management, API key lifecycle operations, and subscription information. Management API endpoints are prefixed with /maas-api/v1 and require authentication via OpenID Connect (OIDC) token or API key.
1.11.7.1. Authentication リンクのコピーリンクがクリップボードにコピーされました!
All API requests except the health endpoint require authentication by using the Authorization header:
Authorization: Bearer <token>
where <token> is either an OpenID Connect (OIDC) JWT or a Models-as-a-Service (MaaS) API key. MaaS API keys are prefixed with sk-oai- and are returned by the API key creation endpoint.
1.11.7.2. GET /maas-api/health リンクのコピーリンクがクリップボードにコピーされました!
Purpose: Cluster health check for load balancers and monitoring systems
Authentication: Not required
Example request:
$ curl https://<maas-gateway-url>/maas-api/health
Example response:
{
"status": "healthy"
}
1.11.7.3. GET /maas-api/v1/models リンクのコピーリンクがクリップボードにコピーされました!
Purpose: List all models accessible to the authenticated user based on their subscriptions
Authentication: Required
Headers:
X-MaaS-Subscription(optional)- When using a user token, specifies which subscription to use for filtering models. Not applicable when using API keys.
Example request:
$ curl -H "Authorization: Bearer <token>" \
https://<maas-gateway-url>/maas-api/v1/models
Example response:
{
"object": "list",
"data": [
{
"id": "llama-3-8b-instruct",
"object": "model",
"created": 1234567890,
"owned_by": "organization",
"ready": true,
"url": "https://<maas-gateway-url>/llm/llama-3-8b-instruct",
"kind": "LLMInferenceService",
"modelDetails": {
"displayName": "Llama 3 8B Instruct",
"description": "Llama 3 instruction-tuned model",
"genaiUseCase": "chat",
"contextWindow": 8192
},
"subscriptions": [
{
"name": "data-science-team",
"displayName": "Data Science Team Subscription",
"description": "Subscription for data science team"
}
]
}
]
}
1.11.7.4. POST /maas-api/v1/api-keys リンクのコピーリンクがクリップボードにコピーされました!
Purpose: Create a new API key for programmatic access
Authentication: Required
Request body:
{
"name": "<key-name>",
"description": "<description>",
"subscription": "<subscription-name>",
"expiresIn": "<duration>",
"ephemeral": <boolean>
}
where:
name- Specifies a human-readable name for the API key. Required for regular keys, optional for ephemeral keys.
description- Provides an optional description of the key’s purpose.
subscription- Specifies the subscription name to associate with the API key. If not specified, the system selects based on user’s group memberships.
expiresIn-
Specifies the expiration duration as a string (for example,
30d,90d,1h,24h). If not specified, keys expire after 90 days. ephemeral-
When
true, creates a short-lived key for temporary use. Default:false.
Example request:
$ curl -X POST https://<maas-gateway-url>/maas-api/v1/api-keys \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d { "name": "production-key", "description": "Key for production workloads", "subscription": "data-science-team", "expiresIn": "90d" }
Example response:
{
"key": "sk-oai-abc123def456...",
"keyPrefix": "sk-oai-abc123",
"id": "key-abc123",
"name": "production-key",
"subscription": "data-science-team",
"createdAt": "2026-05-15T12:00:00Z",
"expiresAt": "2026-08-13T12:00:00Z",
"ephemeral": false
}
The key field is returned only once during creation and cannot be retrieved later. Save the key in a secrets manager before continuing.
1.11.7.5. POST /maas-api/v1/api-keys/search リンクのコピーリンクがクリップボードにコピーされました!
Purpose: Search and filter API keys with pagination and sorting support
Authentication: Required
Request body:
{
"filters": {
"username": "<username>",
"status": ["<status-1>", "<status-2>"],
"includeEphemeral": <boolean>
},
"sort": {
"by": "<sort-field>",
"order": "<sort-order>"
},
"pagination": {
"limit": <limit>,
"offset": <offset>
}
}
where:
filters.username- Filters keys by owner username. Administrators can filter by any username; regular users can only filter their own keys.
filters.status-
Filters by key status. Array of values:
active,revoked,expired. filters.includeEphemeral-
When
true, includes ephemeral keys in results. Default:false. sort.by-
Specifies the sort field. Allowed values:
created_at,expires_at,last_used_at,name. Default:created_at. sort.order-
Specifies the sort direction. Allowed values:
asc,desc. Default:desc. pagination.limit- Specifies the maximum number of results to return. Default: 50. Maximum: 100.
pagination.offset- Specifies the number of results to skip for pagination. Default: 0.
Example request:
$ curl -X POST https://<maas-gateway-url>/maas-api/v1/api-keys/search \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d { "filters": {"status": ["active"]}, "sort": {"by": "created_at", "order": "desc"}, "pagination": {"limit": 20, "offset": 0} }
Example response:
{
"object": "list",
"data": [
{
"id": "key-abc123",
"name": "production-key",
"description": "Key for production workloads",
"username": "alice@example.com",
"subscription": "data-science-team",
"groups": ["data-scientists", "ml-engineers"],
"creationDate": "2026-05-15T12:00:00Z",
"expirationDate": "2026-08-13T12:00:00Z",
"status": "active",
"lastUsedAt": "2026-05-15T15:30:00Z",
"ephemeral": false
}
],
"has_more": false
}
1.11.7.6. GET /maas-api/v1/api-keys/{id} リンクのコピーリンクがクリップボードにコピーされました!
Purpose: Retrieve metadata for a specific API key
Authentication: Required
Example request:
$ curl -H "Authorization: Bearer <token>" \
https://<maas-gateway-url>/maas-api/v1/api-keys/key-abc123
Example response:
{
"id": "key-abc123",
"name": "production-key",
"description": "Key for production workloads",
"username": "alice@example.com",
"subscription": "data-science-team",
"groups": ["data-scientists", "ml-engineers"],
"creationDate": "2026-05-15T12:00:00Z",
"expirationDate": "2026-08-13T12:00:00Z",
"status": "active",
"lastUsedAt": "2026-05-15T15:30:00Z",
"ephemeral": false
}
1.11.7.7. DELETE /maas-api/v1/api-keys/{id} リンクのコピーリンクがクリップボードにコピーされました!
Purpose: Revoke a specific API key immediately
Authentication: Required
Example request:
$ curl -X DELETE \
-H "Authorization: Bearer <token>" \
https://<maas-gateway-url>/maas-api/v1/api-keys/key-abc123
Example response:
{
"id": "key-abc123",
"name": "production-key",
"description": "Key for production workloads",
"username": "alice@example.com",
"subscription": "data-science-team",
"groups": ["data-scientists", "ml-engineers"],
"creationDate": "2026-05-15T12:00:00Z",
"expirationDate": "2026-08-13T12:00:00Z",
"status": "revoked",
"lastUsedAt": "2026-05-15T15:30:00Z",
"ephemeral": false
}
1.11.7.8. POST /maas-api/v1/api-keys/bulk-revoke リンクのコピーリンクがクリップボードにコピーされました!
Purpose: Revoke all active API keys for a specific user
Authentication: Required. Administrator privileges required.
Request body:
{
"username": "<username>"
}
Example request:
$ curl -X POST https://<maas-gateway-url>/maas-api/v1/api-keys/bulk-revoke \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d { "username": "alice@example.com" }
Example response:
{
"message": "Successfully revoked 3 active API key(s) for user alice@example.com"
}
1.11.7.9. GET /maas-api/v1/subscriptions リンクのコピーリンクがクリップボードにコピーされました!
Purpose: List all subscriptions accessible to the authenticated user
Authentication: Required
Example request:
$ curl -H "Authorization: Bearer <token>" \
https://<maas-gateway-url>/maas-api/v1/subscriptions
Example response:
[
{
"subscription_id_header": "data-science-team",
"display_name": "Data Science Team Subscription",
"subscription_description": "Subscription for data science team members",
"priority": 10,
"model_refs": [
{
"name": "llama-3-8b-instruct",
"namespace": "model-serving",
"token_rate_limits": [
{
"limit": 1000000,
"window": "1h"
},
{
"limit": 5000000,
"window": "24h"
}
],
"billing_rate": {
"per_token": 0.001
}
}
],
"organization_id": "org-123",
"cost_center": "ml-team",
"labels": {
"environment": "production",
"team": "data-science"
}
}
]
The response is an array of subscription objects, not an object with a data field. Each subscription includes complete model reference details with token rate limits and billing information.
1.11.7.10. GET /maas-api/v1/model/{model-id}/subscriptions リンクのコピーリンクがクリップボードにコピーされました!
Purpose: List all subscriptions that provide access to a specific model
Authentication: Required
Example request:
$ curl -H "Authorization: Bearer <token>" \
https://<maas-gateway-url>/maas-api/v1/model/llama-3-8b-instruct/subscriptions
Example response:
[
{
"subscription_id_header": "data-science-team",
"display_name": "Data Science Team Subscription",
"subscription_description": "Subscription for data science team members",
"priority": 10,
"model_refs": [
{
"name": "llama-3-8b-instruct",
"namespace": "model-serving",
"token_rate_limits": [
{
"limit": 1000000,
"window": "1h"
}
],
"billing_rate": {
"per_token": 0.001
}
}
],
"organization_id": "org-123",
"cost_center": "ml-team",
"labels": {
"environment": "production"
}
}
]
1.11.7.11. Inference endpoints リンクのコピーリンクがクリップボードにコピーされました!
Model inference endpoints use OpenAI-compatible request and response formats. Each model has a dedicated endpoint path: https://<maas-gateway-url>/llm/{model-name}/v1
1.11.7.12. POST /llm/{model-name}/v1/completions リンクのコピーリンクがクリップボードにコピーされました!
Purpose: Generate text completions
Authentication: Required. API key with sk-oai- prefix.
Example request:
$ curl -X POST https://<maas-gateway-url>/llm/<model-name>/v1/completions \
-H "Authorization: Bearer sk-oai-..." \
-H "Content-Type: application/json" \
-d { "model": "<model-name>", "prompt": "Explain quantum computing", "max_tokens": 100 }
Example response:
{
"id": "cmpl-abc123",
"object": "text_completion",
"created": 1234567890,
"model": "llama-3-8b-instruct",
"choices": [
{
"text": "Quantum computing is a type of computing that uses quantum mechanics...",
"index": 0,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 4,
"completion_tokens": 100,
"total_tokens": 104
}
}
1.11.7.13. POST /llm/{model-name}/v1/chat/completions リンクのコピーリンクがクリップボードにコピーされました!
Purpose: Generate chat-based completions
Authentication: Required. API key with sk-oai- prefix.
Example request:
$ curl -X POST https://<maas-gateway-url>/llm/<model-name>/v1/chat/completions \
-H "Authorization: Bearer sk-oai-..." \
-H "Content-Type: application/json" \
-d { "model": "<model-name>", "messages": [ {"role": "user", "content": "What is AI?"} ], "max_tokens": 100 }
Example response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "llama-3-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Artificial Intelligence (AI) is the simulation of human intelligence..."
},
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 100,
"total_tokens": 112
}
}
1.11.7.14. POST /llm/{model-name}/v1/embeddings リンクのコピーリンクがクリップボードにコピーされました!
Purpose: Generate text embeddings
Authentication: Required. API key with sk-oai- prefix.
Example request:
$ curl -X POST https://<maas-gateway-url>/llm/<model-name>/v1/embeddings \
-H "Authorization: Bearer sk-oai-..." \
-H "Content-Type: application/json" \
-d { "model": "<model-name>", "input": "The quick brown fox" }
Example response:
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [0.0023, -0.0142, 0.0089, ...],
"index": 0
}
],
"model": "llama-3-8b-instruct",
"usage": {
"prompt_tokens": 4,
"total_tokens": 4
}
}
Response formats follow the OpenAI API specification. For complete API reference documentation, see OpenAI API Reference.
1.11.7.15. Error responses リンクのコピーリンクがクリップボードにコピーされました!
All API endpoints return standard HTTP status codes and JSON error responses:
{
"error": {
"message": "Invalid API key",
"type": "authentication_error",
"code": "invalid_api_key"
}
}
Common HTTP status codes:
| Status | Description |
|---|---|
|
| Request successful |
|
| Resource created successfully |
|
| Invalid request parameters |
|
| Missing or invalid authentication |
|
| Insufficient permissions |
|
| Resource not found |
|
| Rate limit exceeded |
|
| Server error |
1.12. Monitor Models-as-a-Service usage with observability リンクのコピーリンクがクリップボードにコピーされました!
You can use the MaaS observability dashboard to monitor subscription-level usage metrics for cost attribution and showback reporting to finance teams.
The MaaS observability dashboard is a Technology Preview feature in Red Hat OpenShift AI. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
1.12.1. Models-as-a-Service observability overview リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can view Models-as-a-Service (MaaS) subscription-level usage metrics in the observability dashboard, including token consumption, request counts, and rate limit violations.
The dashboard is embedded in the OpenShift AI console using Perses and queries metrics from Thanos Querier. Access to the dashboard is restricted to cluster administrators.
The observability dashboard is designed for showback reporting, not as a billing-grade metering system. For production chargeback workflows that require precise billing data, access the Limitador metrics endpoint directly rather than using Prometheus or the dashboard CSV export.
The dashboard provides the following capabilities:
- Overview metrics
- View high-level statistics including total tokens consumed, total requests, total errors, success rate percentage, and active users.
- Filtering
- Filter metrics by user, subscription, and model to analyze specific usage patterns.
- Time range selection
- View metrics for configurable time periods ranging from the last 5 minutes to the last 14 days, or specify a custom date range.
- Token consumption details
- View a detailed table showing token consumption by user, subscription, and model, including request counts and rate limit violations.
- Data export
- Export usage data in CSV format for cost attribution and showback reporting to finance teams.
1.12.1.1. Dashboard metrics リンクのコピーリンクがクリップボードにコピーされました!
The observability dashboard displays the following overview metrics:
| Metric | Description |
|---|---|
| Total Tokens | The total number of tokens consumed across all requests during the selected time period. This includes both input tokens (from user prompts) and output tokens (from model responses). |
| Total Requests | The total number of API requests made to MaaS models during the selected time period. Each API call to a model endpoint counts as one request. |
| Total Errors | The total number of failed requests during the selected time period. This includes requests that failed due to model errors, timeout errors, or other server-side issues. |
| Success Rate |
The percentage of successful requests out of all requests made during the selected time period. Calculated as: |
| Active Users | The number of unique users who made at least one request during the selected time period. Users are identified by their username from API key ownership or OIDC authentication. |
The Token Consumption by User table displays detailed, per-user usage data with the following columns:
| Column | Description |
|---|---|
| User | The username of the user who made the requests. For API key-based requests, this is the user who created the API key. For OIDC-authenticated requests, this is the user’s OIDC identity. |
| Subscription | The subscription used for the requests. If a user belongs to multiple subscriptions, separate rows appear for each subscription. |
| Model |
The model accessed by the user’s requests. The format is |
| Tokens | The total number of tokens consumed by this user for this subscription and model combination. Click the column header to sort the table by token consumption. |
| Requests | The number of API requests made by this user for this subscription and model combination. |
| Rate Limited | The number of requests that were rejected due to rate limiting (HTTP 429 responses). Rate-limited requests count toward the user’s request total without consuming tokens. |
1.12.1.2. Underlying Prometheus metrics リンクのコピーリンクがクリップボードにコピーされました!
The observability dashboard queries the following Prometheus metrics collected from Kuadrant and MaaS components:
| Metric | Description |
|---|---|
|
|
Total number of tokens consumed. Labeled by subscription, model, and limitador_namespace. This metric is collected from |
|
| Total number of API requests made to models. Labeled by subscription and limitador_namespace. This metric counts all successful requests that passed rate limiting and authentication. |
|
| Total number of requests rejected due to rate limiting (HTTP 429 responses). Labeled by limitador_namespace. This metric indicates when users exceed their subscription token limits. |
|
| Request latency at the API gateway. Tagged with subscription dimension for performance analysis. This metric helps identify performance issues by subscription. |
|
|
Time spent in |
The user label is disabled by default in MaaS metrics (captureUser: false). To enable per-user metrics collection, configure the captureUser setting in your MaaS authorization policy. The model label appears only on the authorized_hits metric due to Kuadrant wasm-shim limitations.
These metrics are scraped by Prometheus from the Limitador, Authorino, and gateway components. The observability dashboard aggregates and visualizes these metrics to provide usage insights for cost attribution and capacity planning.
1.12.2. Enable Kuadrant observability for Models-as-a-Service リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can enable observability in the Kuadrant custom resource to collect rate-limiting metrics for Models-as-a-Service usage tracking and monitoring.
Kuadrant uses Limitador, a rate-limiting service, to enforce the token limits defined in Models-as-a-Service (MaaS) subscriptions. When observability is enabled, Kuadrant creates a PodMonitor that configures Prometheus to scrape metrics from Limitador. These metrics track token consumption, request counts, and rate-limit violations, which are displayed in the MaaS observability dashboard for cost attribution and usage monitoring.
Prerequisites
- You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
- You have installed Red Hat OpenShift AI.
- You have configured the observability stack for OpenShift AI. For more information, see Managing observability.
Procedure
Enable Kuadrant observability using the OpenShift console:
-
In the OpenShift console, navigate to Administration
CustomResourceDefinitions. -
Search for
Kuadrantand click the resource name. - Click the Instances tab.
- Click kuadrant.
- Click the YAML tab.
In the YAML editor, locate the
specsection and add or update the observability configuration:spec: observability: enable: trueThe patch uses the following fields:
enable: true-
Enables the
Kuadrantobservability stack, which creates aPodMonitorresource that configures Prometheus to scrape rate-limiting metrics fromLimitador. These metrics are used by the MaaS observability dashboard to track token consumption, request counts, and rate-limit violations.
Click Save.
Alternatively, you can configure the resource using the command line:
$ oc patch kuadrant kuadrant -n kuadrant-system \ --type merge \ -p '{"spec":{"observability":{"enable":true}}}'
-
In the OpenShift console, navigate to Administration
Verification
Verify that the
LimitadorPodMonitorwas created:$ oc get podmonitor kuadrant-limitador-monitor -n kuadrant-systemExpected output shows the
PodMonitorresource:NAME AGE kuadrant-limitador-monitor 2mVerify that the
Kuadrantcustom resource shows observability as enabled:$ oc get kuadrant kuadrant -n kuadrant-system -o jsonpath='{.spec.observability.enable}'Expected output:
trueVerify that Prometheus is scraping metrics from
Limitador:-
In the OpenShift console, navigate to Observe
Metrics. Run the following query to verify rate-limiting metrics are available:
limited_callsYou should see metrics showing rate-limit violations by user and subscription. If no data appears, this is expected if no models have been accessed yet. The metric appears once users begin making requests to MaaS models.
-
In the OpenShift console, navigate to Observe
Next steps
1.12.3. Enable telemetry for Models-as-a-Service リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can enable telemetry in the Tenant custom resource to collect usage metrics from Models-as-a-Service (MaaS) inference requests.
Telemetry configures the MaaS gateway to generate Prometheus metrics about model usage, including token consumption, request counts, and model-specific usage patterns. These metrics are displayed in the MaaS observability dashboard.
Prerequisites
- You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
- You have installed Red Hat OpenShift AI.
- You have deployed Models-as-a-Service.
Procedure
Enable telemetry using the OpenShift console:
-
In the OpenShift console, navigate to Administration
CustomResourceDefinitions. -
Search for
Tenantand click the resource name. - Click the Instances tab.
- Click default-tenant.
- Click the YAML tab.
In the YAML editor, locate the
specsection and add or update thetelemetryconfiguration:spec: telemetry: enabled: true metrics: captureOrganization: true captureUser: false captureGroup: false captureModelUsage: trueThe patch uses the following fields:
enabled: true- Activates TelemetryPolicy and Istio Telemetry to collect MaaS usage metrics.
captureOrganization-
Includes organization identifiers in metrics. Default is
true. captureUser-
Includes user labels in metrics. Default is
falsedue to privacy and cardinality considerations. Enabling this option with a large number of users can significantly increase Prometheus database size. captureGroup-
Includes group labels in metrics. Default is
falseto reduce metric cardinality. captureModelUsage-
Tracks model-specific usage patterns. Default is
true.
Click Save.
Alternatively, you can configure the resource using the command line:
$ oc patch tenants.maas.opendatahub.io default-tenant -n models-as-a-service \ --type merge \ -p '{ "spec": { "telemetry": { "enabled": true, "metrics": { "captureOrganization": true, "captureUser": false, "captureGroup": false, "captureModelUsage": true } } } }'
-
In the OpenShift console, navigate to Administration
Verification
-
In the OpenShift console, navigate to Observe
Metrics. In the query field, enter the following metric name:
authorized_calls- Click Run queries.
If telemetry is enabled, the query returns MaaS usage metrics with labels such as subscription, limitador_namespace, and optionally model depending on your telemetry configuration.
1.12.4. View the Models-as-a-Service observability dashboard リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can use the Models-as-a-Service (MaaS) observability dashboard to monitor token consumption, request counts, and rate-limit violations across subscriptions and users.
The MaaS observability dashboard is intended for internal usage tracking and showback reporting. The metrics are not suitable for billing-grade metering or external invoicing.
Prerequisites
- You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
- You have installed Red Hat OpenShift AI.
- You have configured observability for Models-as-a-Service. For more information, see Prerequisites for Models-as-a-Service.
- You have enabled Kuadrant observability. For more information, see Enable Kuadrant observability for Models-as-a-Service.
- You have enabled telemetry for Models-as-a-Service. For more information, see Enable telemetry for Models-as-a-Service.
Procedure
In the OpenShift AI dashboard, click Observe & monitor
Dashboard in the left navigation menu. The Observability dashboard page displays three tabs: Cluster, Models, and Usage.
- Click the Usage tab to view Models-as-a-Service usage metrics.
- Optional: To change the time range, select a value from the Time period dropdown. Options range from the last 5 minutes to the last 14 days. You can also specify a custom date range.
- Optional: Filter the metrics by user, subscription, or model by selecting values from the User, Subscription, or Model dropdowns. Select All in any dropdown to view metrics for all items in that category.
- Review the Overview section, which displays summary metrics including Total Tokens, Total Requests, Total Errors, Success Rate, and Active Users.
- Review the Token Consumption by User table, which shows detailed per-user usage data.
- Optional: Click column headers in the table to sort by that column.
- Optional: Use the pagination controls at the bottom of the table to navigate through multiple pages of results or adjust the number of rows displayed per page.
Verification
- The Overview section shows non-zero values for users with recent activity.
- The Token Consumption by User table lists users with token consumption in the selected time period.
- Changing the time period updates the metrics to reflect the new range.
Additional resources
1.12.5. Export usage data for cost attribution リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can export Models-as-a-Service usage data in CSV format for cost attribution and showback reporting to finance teams.
Prerequisites
- You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
- The Cluster Observability Operator is installed and configured on your cluster.
- You have access to the OpenShift AI dashboard with administrator privileges.
Procedure
-
In the OpenShift AI dashboard, click Observe & monitor
Dashboard. - Click the Usage tab.
Optional: Configure filters to export specific usage data:
- Select a time period from the Time period dropdown to export metrics for a specific timeframe.
- Select filters for User, Subscription, or Model to export metrics for specific resources.
- Hover over the Token Consumption by User table.
Click Export as CSV to download the usage data.
The system generates a CSV file containing the filtered usage data.
- Save the CSV file to your local system.
The exported CSV file contains subscription-level usage data for the selected time period and filters. This data is suitable for showback reporting but might not be billing-grade accurate. For production chargeback workflows, configure external metering and billing tools to consume this data.
Verification
- The CSV file downloads to your local system and contains usage data matching your selected filters and time period.
Next steps
- Provide the exported usage data to your finance team for cost attribution and showback reporting.
Additional resources
1.13. Configure external OIDC authentication for Models-as-a-Service リンクのコピーリンクがクリップボードにコピーされました!
External OIDC authentication for Models-as-a-Service is currently available in Red Hat OpenShift AI as a Technology Preview feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can configure Models-as-a-Service (MaaS) to authenticate users with an external OpenID Connect (OIDC) identity provider, enabling enterprise-wide access without requiring OpenShift accounts for every user. This allows organizations to integrate MaaS with existing identity providers such as Keycloak, and map external user groups to MaaS subscriptions for access control and quota enforcement.
1.13.1. About external OIDC authentication for Models-as-a-Service リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can authenticate Models-as-a-Service users through an external OpenID Connect (OIDC) identity provider, allowing them to use their existing corporate credentials without requiring OpenShift user accounts.
1.13.1.1. Authentication flow リンクのコピーリンクがクリップボードにコピーされました!
Models-as-a-Service (MaaS) uses a two-tier authentication approach with external OIDC providers as follows:
-
MaaS platform access: Users retrieve a JWT from the external OIDC provider to access platform APIs. The
GatewayAPI validates the OIDC token. - Model access: Users create API keys through the MaaS API by using their OIDC token. These API keys are used for programmatic access to models through the MaaS API gateway.
When using external OIDC authentication, users create API keys through the MaaS API by using curl or other HTTP clients. The OpenShift AI dashboard does not support API key creation for external OIDC users.
This approach provides industry-standard OIDC authentication for user login while maintaining centralized API key management for model access.
1.13.1.2. Group-based access control リンクのコピーリンクがクリップボードにコピーされました!
MaaS validates external identity provider groups directly from OIDC tokens to determine user access. The OIDC token must include group claims for authorization to work. The validation process follows these steps:
- OIDC provider defines user groups.
-
OIDC token includes group claims when a user authenticates. For example, a token might include
groups: ["data-scientists", "ml-engineers"]. Thegroupsclaim is required for MaaS authorization. -
MaaS subscriptions define which groups have access to specific models. For example, a subscription might grant access to the
data-scientistsgroup. -
Authorization policies validate the group claims in the user’s OIDC token against the groups defined in subscriptions. If the token includes
data-scientistsand the subscription grants access to that group, the user is authorized.
Group names in MaaS subscriptions and authorization policies must match the group names in the OIDC token claims exactly. MaaS validates groups directly from the token without requiring OpenShift group creation or synchronization.
1.13.1.3. API key lifecycle リンクのコピーリンクがクリップボードにコピーされました!
The API key creation process for OIDC-authenticated users follows these steps:
- Users retrieve a JWT from the external OIDC provider.
- Users call the MaaS API with their OIDC token to create an API key.
-
The
Gatewayvalidates the OIDC token. -
MaaS generates an API key with an expiration period specified in the API request, up to the maximum limit configured in the
Tenantcustom resource. - Users use this API key for model access.
API keys capture the user’s group memberships at the time of creation. If a user is removed from a group in the external OIDC provider, their existing API keys retain the original group associations and continue to work until revoked or expired. To immediately revoke access, administrators must manually revoke the user’s API keys.
1.13.1.4. Use cases リンクのコピーリンクがクリップボードにコピーされました!
Enterprise deployment: Organizations with existing identity providers such as Keycloak can integrate MaaS without creating OpenShift accounts for every user, reducing the overhead of managing a large user base.
Service provider deployment: Service providers offering MaaS to external customers can authenticate users through a centralized OIDC provider while maintaining subscription-based isolation and quota enforcement.
Regulated industries: Organizations with compliance requirements for centralized authentication and audit logging can use external OIDC integration while maintaining MaaS governance features.
1.13.2. Configure Models-as-a-Service for external OIDC users リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can configure Models-as-a-Service (MaaS) to authenticate users through an external OpenID Connect (OIDC) identity provider to enforce group-based access control without requiring OpenShift accounts for every user.
Prerequisites
- You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
- You have installed Red Hat OpenShift AI.
- You have deployed Models-as-a-Service.
- You have an external OIDC provider.
- You have registered a client application in your OIDC provider and obtained the issuer URL and client ID.
- You have created user groups in your external OIDC provider.
- Your external OIDC provider is configured to include group claims in ID tokens.
Procedure
Edit the
Tenantcustom resource to add theexternalOIDCconfiguration, using one of the following methods:- Using the OpenShift console
-
In the OpenShift console, navigate to Administration
CustomResourceDefinitions. -
Search for
Tenantand click the resource name. - Click the Instances tab.
- Click default-tenant.
- Click the YAML tab.
In the YAML editor, locate the
specsection and add or update theexternalOIDCconfiguration:spec: externalOIDC: issuerUrl: <oidc-provider-issuer-url> clientId: <oidc-client-id>The configuration uses the following fields:
-
In the OpenShift console, navigate to Administration
<oidc-provider-issuer-url>- Specifies the endpoint URL for your external identity provider.
<oidc-client-id>Specifies the client ID for your MaaS application registered with the OIDC provider.
- Click Save.
- Using the OpenShift CLI
Run the following command:
$ oc patch tenants.maas.opendatahub.io default-tenant -n models-as-a-service \ --type merge \ -p { "spec": { "externalOIDC": { "issuerUrl": "<oidc-provider-issuer-url>", "clientId": "<oidc-client-id>" } } }
Create MaaS subscriptions that include the groups from your OIDC provider.
NoteGroup names in subscriptions must match the group names in the OIDC token claims exactly. MaaS validates group memberships directly from the OIDC token.
When configuring subscriptions, enter group names exactly as they appear in your OIDC provider’s group claims. For example, if your OIDC token includes
groups: ["data-scientists"], enterdata-scientistsin the subscription.For information about creating subscriptions, see Managing subscriptions for Models-as-a-Service.
Verification
To verify external OIDC authentication, obtain a token from your OIDC provider and use it to access the MaaS API:
Obtain an OIDC token from your identity provider using the OAuth 2.0 client credentials grant:
$ curl -X POST "<oidc-token-endpoint>" \ -d "client_id=<client-id>" \ -d "client_secret=<client-secret>" \ -d "grant_type=client_credentials"The configuration uses the following fields:
<oidc-token-endpoint>- Specifies the token endpoint URL from your OIDC provider.
<client-id>- Specifies your OIDC client ID.
<client-secret>- Specifies your OIDC client secret.
Use the OIDC token to list available models:
$ curl -H "Authorization: Bearer <oidc-token>" \ https://<maas-gateway-url>/maas-api/v1/modelsThe configuration uses the following fields:
<oidc-token>- Specifies the JWT obtained from your OIDC provider.
<maas-gateway-url>Specifies your MaaS gateway URL.
If authentication is successful, the API returns a list of models available to the groups in your token.
API keys capture group memberships at creation time. If a user is removed from a group in the external OIDC provider, their existing API keys continue to work until revoked or expired. Administrators must manually revoke API keys to immediately revoke access.
Next steps
Additional resources
1.14. Configure external models for Models-as-a-Service リンクのコピーリンクがクリップボードにコピーされました!
External models for Models-as-a-Service is currently available in Red Hat OpenShift AI as a Technology Preview feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can configure Models-as-a-Service (MaaS) to route inference requests to models hosted by external cloud providers such as OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, or Google Vertex AI. This enables unified governance and authentication for both locally deployed models and external model endpoints through the same MaaS gateway.
1.14.1. About external models for Models-as-a-Service リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can use Models-as-a-Service (MaaS) external models to route inference requests to large language models hosted outside the cluster, such as OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, and Google Vertex AI, through the same MaaS gateway you use for locally deployed models.
External models appear in the MaaS dashboard alongside locally deployed models. Users access external models the same way as locally deployed models: administrators include the model in a subscription, users create a MaaS API key, and users make requests using the OpenAI-compatible API format. External models differ from locally deployed models in two ways: external models use a two-tier authentication pattern, and token limits at the external provider level are shared across all users.
1.14.1.1. Two-tier authentication リンクのコピーリンクがクリップボードにコピーされました!
External model access uses a two-tier authentication pattern:
- User authentication: Users authenticate to MaaS by using their MaaS API key. MaaS validates the user subscription and confirms access to the requested external model.
- Provider authentication: MaaS automatically injects the provider API key from the Kubernetes secret when forwarding requests to the external provider.
As a result, users need only their MaaS API key to access external models. The provider API key is managed by administrators and shared across all users of that external model.
Token limits apply at two levels:
- MaaS subscription level: Token limits configured in MaaS subscriptions apply per-user within MaaS. These limits control individual user consumption.
- External provider level: Token limits imposed by the external provider on the API key apply to the aggregate consumption of all users of that external model. Because all users share the same provider API key, the provider-level limit is shared across the entire group of users.
Administrators must ensure that the token limit on the provider API key can handle the combined consumption of all users who access the external model. If the provider-level limit is exceeded, no users can access the model until the provider resets the limit, which is typically hourly, daily, or monthly depending on the provider. Individual MaaS subscription limits do not affect this provider-level restriction.
1.14.2. Configure routing to external model providers リンクのコピーリンクがクリップボードにコピーされました!
In Red Hat OpenShift AI, you can configure Models-as-a-Service (MaaS) to route inference requests to large language models (LLMs) hosted by external cloud providers such as OpenAI, Anthropic, AWS Bedrock, Azure OpenAI, or Google Vertex AI, enabling unified governance for both locally deployed models and external models. Users access external models through MaaS using their MaaS API keys. The provider API key is used by MaaS to authenticate requests to the external provider.
Token limits apply at two levels:
- MaaS subscription level: Token limits you configure in MaaS subscriptions control per-user consumption within MaaS.
- External provider level: Token limits on the provider API key, configured by the external provider, apply to the aggregate consumption of all users accessing the external model. All users share the same provider API key, so the provider-level limit is shared across all users.
Make sure that the token limit on the provider API key can handle the combined consumption of all users who access the external model. If the provider-level limit is exceeded, no users can access the model until the provider resets the limit, which is typically hourly, daily, or monthly depending on the provider.
Prerequisites
- You have cluster administrator privileges for the OpenShift cluster where OpenShift AI is installed.
- You have installed Red Hat OpenShift AI.
- You have deployed Models-as-a-Service.
- You have created at least one MaaS subscription to which you will add the external model.
-
You have identified the external model provider, API endpoint hostname, and target model ID. For example, provider
openai, endpointapi.openai.com, and target modelgpt-4o. - You have an API key for the external model provider.
-
You have created a model namespace, such as
llm.
Procedure
Create a Kubernetes secret with the external provider API key:
$ oc create secret generic <provider-api-key-secret> \ --from-literal=api-key=<provider-api-key> \ -n <model-namespace>The command uses the following placeholders:
<provider-api-key-secret>- Specifies the name of the secret containing the provider API key.
<provider-api-key>- Specifies the API key for the external provider.
<model-namespace>Specifies the name of the model namespace you created, such as
llm.This secret stores the provider API key that MaaS uses to authenticate requests to the external model.
Create an
ExternalModelcustom resource, using one of the following methods:- Using the OpenShift console
-
In the OpenShift console, navigate to Administration
CustomResourceDefinitions. -
Search for
ExternalModeland click the resource name. - Click the Instances tab.
-
From the Project dropdown, select your model namespace, such as
llm. - Click Create ExternalModel.
In the YAML editor, replace the default content with the following configuration:
apiVersion: maas.opendatahub.io/v1alpha1 kind: ExternalModel metadata: name: <external-model-name> namespace: <model-namespace> spec: provider: <provider-type> endpoint: <external-provider-hostname> targetModel: <target-model-id> credentialRef: name: <provider-api-key-secret>The YAML file uses the following placeholders:
-
In the OpenShift console, navigate to Administration
<external-model-name>- Specifies the name for the external model resource.
<provider-type>-
Specifies the provider type. Allowed values:
openai,anthropic,azure-openai,vertex,bedrock-openai. <external-provider-hostname>-
Specifies the fully qualified domain name (FQDN) of the external provider without scheme or path, such as
api.openai.com. <target-model-id>-
Specifies the upstream model identifier at the provider, such as
gpt-4o. <provider-api-key-secret>- Specifies the name of the secret created in the previous step.
<model-namespace>Specifies the name of the model namespace you created, such as
llm.- Click Create.
- Using the OpenShift CLI
Run the following command:
$ cat <<EOF | oc apply -f - apiVersion: maas.opendatahub.io/v1alpha1 kind: ExternalModel metadata: name: <external-model-name> namespace: <model-namespace> spec: provider: <provider-type> endpoint: <external-provider-hostname> targetModel: <target-model-id> credentialRef: name: <provider-api-key-secret> EOF
NoteWhen the
ExternalModelresource is created, the MaaS controller automatically creates the required networking resources:Service,HTTPRoute,ServiceEntry, andDestinationRule. These resources enable routing from the MaaS gateway to the external provider. If external model routing fails, verify that these resources were created successfully in your model namespace.Add the external model to a MaaS subscription.
When creating or updating subscriptions, the external model appears in the model list alongside locally deployed models. Select the external model and configure token limits the same way as for internal models.
For information about creating subscriptions, see Managing subscriptions for Models-as-a-Service.
Verification
Verify that the external model was added to the subscription:
-
In the OpenShift AI dashboard, navigate to Settings
Subscriptions. - Click the subscription name.
- In the Models section, verify that the external model appears in the list.
-
In the OpenShift AI dashboard, navigate to Settings
- Log in to the OpenShift AI dashboard as a user who belongs to a group included in the subscription.
-
Click Gen AI studio
AI asset endpoints. - Verify that the external model appears in the model list.
Create a MaaS API key and make a test inference request to the external model:
$ curl -X POST https://<maas-gateway-url>/maas-api/v1/chat/completions \ -H "Authorization: Bearer <maas-api-key>" \ -H "Content-Type: application/json" \ -d { "model": "<external-model-name>", "messages": [{"role": "user", "content": "Hello"}] }The command uses the following placeholders:
<maas-api-key>- Specifies your MaaS API key created in the dashboard.
<maas-gateway-url>- Specifies your MaaS gateway URL.
<external-model-name>Specifies the name of the external model resource you created.
A successful response confirms that MaaS routed the request to the external provider and returns the model completion.
Additional resources
1.15. Models-as-a-service administration troubleshooting リンクのコピーリンクがクリップボードにコピーされました!
As a OpenShift AI administrator, you can diagnose and resolve common administrative issues with Models-as-a-Service (MaaS) deployment, configuration, and management.
1.15.1. Component enablement issues リンクのコピーリンクがクリップボードにコピーされました!
If the maas-api pod fails to start or shows errors after enabling the MaaS component:
Check the pod logs for error messages:
$ oc logs -n redhat-ods-applications -l app.kubernetes.io/name=maas-apiVerify that all prerequisites are met, especially:
-
Kuadrantis running in thekuadrant-systemnamespace -
The
maas-default-gatewayGatewayexists in theopenshift-ingressnamespace KServe component is set to
Managedin the DataScienceClusterIf
Kuadrantis not in a ready state:Check the
KuadrantOperator status:$ oc get kuadrant -n kuadrant-systemIf the
Kuadrantresource shows a non-ready status, restart theKuadrantOperator:-
In the OpenShift console, navigate to Operators
Installed Operators. -
Select the
kuadrant-systemnamespace. - Click Red Hat Connectivity Link.
- From the Actions menu, click Restart.
-
In the OpenShift console, navigate to Operators
Wait for the Operator to restart and verify that Kuadrant becomes ready:
$ oc wait Kuadrant -n kuadrant-system kuadrant --for=condition=Ready --timeout=5m
-
Check for events related to the MaaS deployment:
$ oc get events -n redhat-ods-applications --sort-by='.lastTimestamp' | grep maasVerify that the required RBAC resources were created:
$ oc get clusterrole | grep maas $ oc get clusterrolebinding | grep maas
1.15.2. Dashboard visibility issues リンクのコピーリンクがクリップボードにコピーされました!
If MaaS features do not appear in the dashboard:
Verify that the MaaS API component is running:
$ oc get pods -n redhat-ods-applications -l app.kubernetes.io/name=maas-apiCheck that the OdhDashboardConfig was updated correctly:
$ oc get odhdashboardconfig odh-dashboard-config -n redhat-ods-applications -o yaml | grep -A 2 "dashboardConfig:"Verify that
modelAsService: truefor admin features (Subscriptions, Authorization Policies) andgenAiStudio: truefor user-facing features (Models tab in AI asset endpoints).- Clear your browser cache and hard refresh the dashboard (Ctrl+Shift+R or Cmd+Shift+R).
Check the dashboard pod logs for errors:
$ oc logs -n redhat-ods-applications $(oc get pods -n redhat-ods-applications -o name | grep dashboard | head -1) --tail=50- Verify that you have the required permissions to view MaaS features. Admin features require administrator access.
1.15.3. Model visibility issues リンクのコピーリンクがクリップボードにコピーされました!
If a model is missing from the available models for MaaS:
- Verify that you selected Publish as MaaS in the Advanced settings during deployment.
Check that the
MaaSModelRefwas created:$ oc get maasmodelref -n <your-project-namespace>Check that the MaaS API is running:
$ oc get pods -n redhat-ods-applications -l app.kubernetes.io/name=maas-apiVerify that the model deployment is in a Ready state:
$ oc get llminferenceservice -n <your-project-namespace>
1.15.4. User access errors: 403 Forbidden リンクのコピーリンクがクリップボードにコピーされました!
If users receive 403 Forbidden errors when accessing models through MaaS:
Verify that the user has both a subscription and an authorization policy:
- A subscription grants quota for specific models with token limits.
- An authorization policy is required to authorize groups to access model endpoints through the API gateway.
Check that a subscription exists for the user’s groups:
$ oc get maassubscriptions -n redhat-ods-applicationsVerify that the model is included in the subscription:
$ oc get maassubscription <subscription-name> -n redhat-ods-applications -o yamlCheck that an authorization policy exists:
$ oc get maasauthpolicies -n redhat-ods-applicationsVerify that the authorization policy includes the user’s groups:
$ oc get maasauthpolicy <policy-name> -n redhat-ods-applications -o yaml
1.15.5. Subscription access control issues リンクのコピーリンクがクリップボードにコピーされました!
If users receive unexpected access denials:
Verify that the subscription status is
Active:$ oc get maassubscription <subscription-name> -n redhat-ods-applications -o jsonpath='{.status.phase}'Check the subscription conditions for errors:
$ oc get maassubscription <subscription-name> -n redhat-ods-applications -o jsonpath='{.status.conditions}'Ensure the model deployment is ready:
$ oc get llminferenceservice -n <model-namespace>Check the
Gatewaylogs for authorization errors:$ oc logs -n kuadrant-system -l app=authorino --tail=50Verify that the
MaaSModelRefexists for the model:$ oc get maasmodelref -n <model-namespace>
1.15.6. Subscription management issues リンクのコピーリンクがクリップボードにコピーされました!
If users receive access errors when attempting to use models after creating a subscription:
- Verify that the user’s OpenShift groups are listed in the subscription’s groups.
- Verify that the model is included in the subscription’s model list.
- Check that at least one token limit is configured for each model in the subscription.
- If multiple subscriptions apply to a user, verify that the correct subscription is being used based on priority level (higher numbers have higher priority).
Check the MaaS API logs for subscription resolution errors:
$ oc logs -n redhat-ods-applications -l app.kubernetes.io/name=maas-api --tail=50 | grep subscription- Verify that a matching authorization policy was created if you selected that option during subscription creation.
1.15.7. Subscription phase shows Failed リンクのコピーリンクがクリップボードにコピーされました!
If a subscription shows a Failed phase status:
Check the subscription status conditions for the failure reason:
$ oc describe maassubscription <subscription-name> -n redhat-ods-applications-
Verify that all referenced models exist and have valid
MaaSModelRefobjects. - Ensure that the groups specified in the subscription are valid OpenShift groups.
- Check that token limits are properly configured for all models.
Review the MaaS API logs for detailed error messages:
$ oc logs -n redhat-ods-applications -l app.kubernetes.io/name=maas-api --tail=100