Accelerate enterprise software development with NVIDIA and MaaS

Optimize private app development using NVIDIA Nemotron models through Models-as-a-Service on your own multi-tenant infrastructure in Red Hat AI.

Red Hat AI EnterpriseCode developmentAdopt and scale AI

This content is authored by Red Hat experts, but has not yet been tested on every supported configuration.

Accelerate enterprise software development with NVIDIA and MaaS

Optimize private app development using NVIDIA Nemotron models through Models-as-a-Service on your own multi-tenant infrastructure in Red Hat AI.

Table of contents

Detailed description

Developing software with speed and efficiency is a competitive necessity. Developers are often overwhelmed and slowed down by repetitive code, complicated debugging and testing, and the constant need to learn new technologies. AI-powered coding assistance can help, but how do you leverage it securely and cost-effectively?

For organizations restricted by strict data privacy requirements, regulations, or specific performance needs, public AI hosted services often are not an option. As your usage expands, you also need to consider how to keep things as cost-efficient as possible. Models as a Service (MaaS) solves this by enabling centralized IT teams to host and manage private models that remote teams can consume easily and securely. This keeps proprietary data within the organization’s boundaries while providing developers access to the generative AI technology they need. By providing access to the models via API tokens, administrators can also implement specific rate limits and quotas. This approach doesn’t just simplify access and usage, it allows organizations to monitor metrics, forecast capacity and compute needs, and manage chargebacks with precision.

This quickstart demonstrates how you can easily deploy a private AI code assistant powered by NVIDIA Nemotron models and delivered through Red Hat AI's integrated Models as a Service (MaaS) offering. Developers access the assistant through OpenShift DevSpaces, a containerized cloud-native IDE included in OpenShift.

Architecture diagrams

Code Assistant w/ MaaS Architecture

This diagram illustrates a models-as-a-service architecture on Red Hat AI including the model deployments in addition to the code assistant application with OpenShift DevSpaces. For more details click here.

Layer/Component Technology Purpose/Description
Orchestration Red Hat AI Enterprise Container orchestration and comprehensive AI platform
Inference vLLM and llm-d High performance inference engine for Gen AI model deployment and kubernetes-native distributed inference capabilities with llm-d
LLM nemotron-3-nano-30b-a3b-fp8 A quantized 30B-parameter hybrid Mamba-Transformer MoE model with a 1M-token context window, designed for efficient reasoning, chat, and agentic AI applications
Models-as-a-Service Red Hat AI Enterprise Integrated LLM governance layer that provides rate-limited model access with usage tracking and chargeback across teams
GPU Acceleration NVIDIA GPU Operator Enables GPUs and manages drivers, DCGM, container toolkit, and MIG capabilities for GPU acceleration
Development Environment OpenShift DevSpaces Provides IDE instances for development teams to develop and deploy all on the same cluster
Observability Prometheus Operator Monitors model inference metrics and GPU telemetry
Dashboard Grafana Metrics scraped from Prometheus are then surfaced and shown visually in custom Grafana dashboards

Requirements

Minimum hardware requirements

  • One NVIDIA GPU node with at least 48GB VRAM for Nemotron model
  • One NVIDIA GPU node with at least 48GB VRAM for gpt-oss model

Note: Models in this quickstart were tested with 2 L40S GPU instances on AWS (instance type g6e.2xlarge).

Minimum software requirements

  • Red Hat OpenShift 4.20
  • Helm CLI
  • OpenShift Client CLI
  • Bash shell available in PATH
  • sed available in PATH (works with macOS/POSIX-only as well as common GNU versions)

Required user permissions

  • Regular user permissions for usage of Models-as-a-Service enabled endpoint, access to DevSpaces workspace, and access to Grafana dashboard for viewing usage data.
  • Cluster Admin access needed for any changes to model deployments or MaaS configurations.

Deploy

The following instructions will easily deploy the quickstart to your Red Hat AI environment using an auto-pilot script-based installation. This will configure the necessary prerequisites for your environment and wire everything together, removing the need for additional configuration.

Please see the advanced deployment section for details on setting up your own prerequisites and deploying the quickstart with more control.

Prerequisites

  • OpenShift cluster (specific version is specified in the software requirements section)
    • Optional (recommended): trusted certificates managed for the OpenShift Router, as documented
  • A default StorageClass needs to be configured. If your cluster is on a cloud provider, this is probably available out of the box. If you're on bare metal or some hypervisor environments, you may need to install additional operators to enable a default StorageClass. See the documentation for OpenShift Data Foundation or the LVM Storage Operator documentation for installation on bare metal
  • OpenShift cluster has GPUs available
  • The NVIDIA GPU Operator is installed and configured with a ClusterPolicy (or other API) to configure the driver and make the resources available to Kubernetes to schedule
  • You do not have other workloads or configurations in the cluster, meaning:
    • An identity provider is not deployed or configured
    • Red Hat OpenShift AI is not installed
    • Red Hat Connectivity Link is not deployed or configured
    • Red Hat OpenShift Dev Spaces is not deployed

Installation Steps

  1. git clone quickstart repository
git clone https://github.com/rh-ai-quickstart/maas-code-assistant.git
  1. cd into the directory
cd maas-code-assistant
  1. Ensure you’re logged into your cluster as a cluster-admin user, such as kube:admin or system:admin:
oc whoami
  1. Run all-in-one.sh. Enter passwords for the admin and user accounts when prompted (these will be saved in the .env file after the first run of the script, and you won't be prompted again).
./all-in-one.sh

[!NOTE] This installation will leave the kubeadmin user in your cluster, prompting you to select a source to log in from. The rhbk option added to this menu is required to use the users and passwords specified above, and to be able to use MaaS models. If you would like to remove the prompt to select an identity provider and have it default to the Red Hat build of Keycloak, you can edit environment.yaml.tpl and set keycloak.removeKubeAdmin to true before running the script.

Delete

To remove the core quickstart components (models, Dev Spaces workspaces, etc.) run the following:

helm uninstall maas-code-assistant

To clean up the dependencies, such as OpenShift AI, follow their documented uninstallation procedures by removing their Operands first, allowing the operators to reconcile and complete removal, before uninstalling the operators themselves.

References

  • vLLM: The High-Throughput and Memory-Efficient inference and serving engine for LLMs.
  • llm-d: a Kubernetes-native high-performance distributed LLM inference framework.
  • Red Hat OpenShift DevSpaces: a container-based, in-browser development environment offered by Red Hat that facilitates cloud-native development directly within the OpenShift ecosystem. Included within the OpenShift product offering.
  • NVIDIA Nemotron: a family of open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents.
  • NVIDIA GPU Operator: uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU.

Advanced Deployment

This advanced deployment option will allow you to control the deployment of all prerequisites separately and tailor it to your specific environment.

Use this deployment path if you:

  • Have a configured cluster with some or all of the prerequisites already deployed.
  • Prefer a different configuration path than the defaults set in the quickstart repository installation script.
  • Are using the cluster for other workloads and therefore need to customize the installation to avoid conflict with existing cluster resources.

Prerequisites

The following prerequisites are required in your environment to prevent any conflicts with the quickstart:

  • Users have been configured with OpenShift OAuth, backed by OIDC or some other auth method such as htpasswd, as documented.
  • OpenShift cluster and user-workload monitoring is configured, as documented.
  • Grafana is deployed and managed through the Grafana Operator, in the grafana namespace.
    • An example Grafana operand, with all RBAC and resources wired up to User Workload Monitoring, is available in docs/examples/grafana.yaml. It expects that your Grafana Operator installation was namespace scoped, and deployed to the grafana namespace, and that your in-cluster registry is configured as documented. You can configure it differently to not depend on the registry.
  • Red Hat OpenShift Dev Spaces is deployed, as documented.
    • A basic CheCluster resource is configured, as in steps 2 and 3 of the above.
  • The cert-manager Operator for Red Hat OpenShift has been deployed as documented.
  • The Leader Worker Set Operator has been deployed, as documented.
  • Red Hat OpenShift AI version 3.3.0 has been deployed from the fast-3.x, stable-3.x, or stable-3.3 channels, as documented.
    • A Data Science Cluster has been created that enables at least the Dashboard, KServe, and Llama Stack Operator components, as documented.
    • Note that using Manual approval mode with the startingCSV set to rhods-operator.3.3.0 is recommended to stay on the version tested with this code base.
  • Red Hat Connectivity Link has been deployed from the stable channel, as documented.
    • A Kuadrant resource has been installed in the kuadrant-system namespace, as documented.
    • The Authorino resource that gets created from this Kuadrant instance has been modified with the following to enable TLS on the Authorino endpoint:
      oc annotate service -n kuadrant-system authorino-authorino-authorization service.beta.openshift.io/serving-cert-secret-name=authorino-server-cert --overwrite
      oc patch authorino -n kuadrant-system authorino --type=merge --patch '{"spec": {"listener": {"tls": {"enabled": true, "certSecretRef": {"name": "authorino-server-cert"}}}}}'
      
  • You have created the openshift-default GatewayClass object for Gateway API in OpenShift, and are able to create Gateway instances using your cluster's load balancer and infrastructure configuration. See the documentation for more details about Gateway API in OpenShift.
  • You have created the maas-default-gateway Gateway object in the openshift-ingress namespace using an infrastructure configuration that is supported for your environment and it shows that it is programmed, when verified as documented. It additionally needs the opendatahub.io/managed: "false" label and the opendadatahub.io/managed: "false" and security.opendatahub.io/authorino-tls-bootstrap: "true" annotations set. Without these, policy enforcement will not work as expected.
    • An example of some possible Gateway configurations is available as a Helm template in this repository, at charts/dependency-operators/files/openshift-ai/gateway.yaml. You can use this template as the basis of a custom manifest by removing the templating syntax and configuring it to suit your environment.

Installation Steps

  1. Ensure you’re logged into your cluster as a cluster-admin user:
oc whoami
oc get nodes
  1. Copy charts/maas-code-assistant/values.yaml to edit it:
cp charts/maas-code-assistant/values.yaml environment.yaml
  1. Edit the file and update the following sections to match your environment:

    1. global.wildcardDomain and global.wildcardCertName

      1. You can recover the proper values by running the following:
      oc get ingresscontroller -n openshift-ingress-operator default -ojsonpath='{.status.domain}{"\n"}'
      oc get ingresscontroller -n openshift-ingress-operator default -ojsonpath='{.spec.defaultCertificate.name}{"\n"}'
      
    2. If you are on a bare metal or non-cloud hypervisor environment, your integrated image registry might be disabled. If it is, update global.toolsImage to refer to a container image that at least contains oc.

      1. You can get one such image for your cluster by running the following:
      oc adm release info --image-for=tools
      
    3. grafana.namespace and grafana.selectors

      1. Use the Namespace of your Grafana resource for the Grafana Operator.
      2. Set selectors to match labels on your Grafana instance. For example, if you get the following output:
      oc get grafana grafana -n grafana -ojsonpath='{.metadata.labels}' | jq .
      

      {
      “app”: “grafana”
      }
      You should set selectors to app: grafana.

  2. Update the tiers section to map your desired user/tier mapping for the default MaaS tiers.

    1. For example, if you have users named “bob,” “sue,” and “tom,” and would like them all to be in the enterprise tier, with user “sally” in the premium tier and “frank” in the free tier, use the following value for tiers:
    tiers:
      free:
        users:
          - frank
      premium:
        users:
          - sally
      enterprise:
        users:
          - bob
          - sue
          - tom
    
    1. If you would like to change the request rates and token rates as well, feel free to do so.
  3. Complete any tweaks necessary to the models array to ensure the workloads will place on your GPU-enabled nodes. This may involve changing the tolerations, adjusting the resources, adding the nodeSelector field to each model and configuring it with a valid nodeSelector for the pod template, etc.

  4. Install the quickstart with helm:

helm install maas-code-assistant ./charts/maas-code-assistant -f environment.yaml

Tags

  • Product: Red Hat AI Enterprise
  • Use case: Code development
  • Industry: Adopt and scale AI
Red Hat logoGithubredditYoutubeTwitter

Aprender

Pruebe, compre y venda

Comunidades

Acerca de Red Hat

Ofrecemos soluciones reforzadas que facilitan a las empresas trabajar en plataformas y entornos, desde el centro de datos central hasta el perímetro de la red.

Hacer que el código abierto sea más inclusivo

Red Hat se compromete a reemplazar el lenguaje problemático en nuestro código, documentación y propiedades web. Para más detalles, consulte el Blog de Red Hat.

Acerca de la documentación de Red Hat

Legal Notice

Theme

© 2026 Red Hat
Volver arriba