Accelerate enterprise software development with NVIDIA and MaaS

Optimize private app development using NVIDIA Nemotron models through Models-as-a-Service on your own multi-tenant infrastructure in Red Hat AI.

Red Hat AI EnterpriseCode developmentAdopt and scale AI

This content is authored by Red Hat experts, but has not yet been tested on every supported configuration.

Accelerate enterprise software development with NVIDIA and MaaS

Optimize private app development using NVIDIA Nemotron models through Models-as-a-Service on your own multi-tenant infrastructure in Red Hat AI.

Table of contents

Detailed description

Developing software with speed and efficiency is a competitive necessity. Developers are often overwhelmed and slowed down by repetitive code, complicated debugging and testing, and the constant need to learn new technologies. AI-powered coding assistance can help, but how do you leverage it securely and cost-effectively?

For organizations restricted by strict data privacy requirements, regulations, or specific performance needs, public AI hosted services often are not an option. As your usage expands, you also need to consider how to keep things as cost-efficient as possible. Models as a Service (MaaS) solves this by enabling centralized IT teams to host and manage private models that remote teams can consume easily and securely. This keeps proprietary data within the organization’s boundaries while providing developers access to the generative AI technology they need. By providing access to the models via API tokens, administrators can also implement specific rate limits and quotas. This approach doesn’t just simplify access and usage, it allows organizations to monitor metrics, forecast capacity and compute needs, and manage chargebacks with precision.

This quickstart demonstrates how you can easily deploy a private AI code assistant powered by NVIDIA Nemotron models and delivered through Red Hat AI's integrated Models as a Service (MaaS) offering. Developers access the assistant through OpenShift DevSpaces, a containerized cloud-native IDE included in OpenShift.

Architecture diagrams

Code Assistant w/ MaaS Architecture

This diagram illustrates a models-as-a-service architecture on Red Hat AI including the model deployments in addition to the code assistant application with OpenShift DevSpaces. For more details click here.

Layer/Component Technology Purpose/Description
Orchestration Red Hat AI Enterprise Container orchestration and comprehensive AI platform
Inference vLLM and llm-d High performance inference engine for Gen AI model deployment and kubernetes-native distributed inference capabilities with llm-d
LLM nemotron-3-nano-30b-a3b-fp8 A quantized 30B-parameter hybrid Mamba-Transformer MoE model with a 1M-token context window, designed for efficient reasoning, chat, and agentic AI applications
Models-as-a-Service Red Hat AI Enterprise Integrated LLM governance layer that provides rate-limited model access with usage tracking and chargeback across teams
GPU Acceleration NVIDIA GPU Operator Enables GPUs and manages drivers, DCGM, container toolkit, and MIG capabilities for GPU acceleration
Development Environment OpenShift DevSpaces Provides IDE instances for development teams to develop and deploy all on the same cluster
Observability Prometheus Operator Monitors model inference metrics and GPU telemetry
Dashboard Grafana Metrics scraped from Prometheus are then surfaced and shown visually in custom Grafana dashboards

Requirements

Minimum hardware requirements

  • One NVIDIA GPU node with 48GB VRAM for Nemotron model
  • One NVIDIA GPU node with 48GB VRAM for gpt-oss model

Note: Models in this quickstart were tested with 2 L40S GPU instances on AWS (instance type g6e.2xlarge).

Minimum software requirements

  • Red Hat OpenShift 4.20
  • Red Hat OpenShift AI 3.2
  • Helm CLI
  • OpenShift Client CLI
  • Bash shell available in PATH
  • sed available in PATH

Required user permissions

  • Regular user permissions for usage of Models-as-a-Service enabled endpoint, access to DevSpaces workspace, and access to Grafana dashboard for viewing usage data.
  • Cluster Admin access needed for any changes to model deployments or MaaS configurations.

Deploy

The following instructions will easily deploy the quickstart to your Red Hat AI environment using an auto-pilot script-based installation. This will configure the necessary prerequisites for your environment and wire everything together, removing the need for additional configuration.

Please see the advanced deployment section for details on setting up your own prerequisites and deploying the quickstart with more control.

Prerequisites

  • OpenShift cluster (specific version is specified in the software requirements section)
    • Optional: certificates managed for the OpenShift Router
  • OpenShift cluster has GPUs available
  • The NVIDIA GPU Operator is installed and configured with a ClusterPolicy to configure the driver
  • You do not have other workloads or configurations in the cluster, such as:
    • An identity provider deployed and configured
    • Red Hat OpenShift AI installed
    • Red Hat Connectivity Link deployed and configured
    • Red Hat OpenShift Dev Spaces deployed

Installation Steps

  1. git clone quickstart repository
git clone https://github.com/rh-ai-quickstart/maas-code-assistant.git
Copy to Clipboard Toggle word wrap
  1. cd into the directory
cd maas-code-assistant
Copy to Clipboard Toggle word wrap
  1. Ensure you’re logged into your cluster as a cluster-admin user, such as kube:admin or system:admin:
oc whoami
Copy to Clipboard Toggle word wrap
  1. Run all-in-one.sh. Enter passwords for the admin and user accounts when prompted.
./all-in-one.sh
Copy to Clipboard Toggle word wrap

Delete

To remove the core quickstart components (models, Dev Spaces workspaces, etc.) run the following:

helm uninstall maas-code-assistant
Copy to Clipboard Toggle word wrap

To remove the Developer Preview of MaaS, run this afterwards:

oc delete -k ./dev-preview
Copy to Clipboard Toggle word wrap

To clean up other dependencies, such as Red Hat Connectivity Link and OpenShift AI, follow their documented uninstallation procedures by removing their Operands first, allowing the operators to reconcile and complete removal, before uninstalling the operators themselves.

References

  • vLLM: The High-Throughput and Memory-Efficient inference and serving engine for LLMs.
  • llm-d: a Kubernetes-native high-performance distributed LLM inference framework.
  • Red Hat OpenShift DevSpaces: a container-based, in-browser development environment offered by Red Hat that facilitates cloud-native development directly within the OpenShift ecosystem. Included within the OpenShift product offering.
  • NVIDIA Nemotron: a family of open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents.
  • NVIDIA GPU Operator: uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU.

Advanced Deployment

This advanced deployment option will allow you to control the deployment of all prerequisites separately and tailor it to your specific environment.

Use this deployment path if you:

  • Have a configured cluster with some or all of the prerequisites already deployed.
  • Prefer a different configuration path than the defaults set in the quickstart repository installation script.
  • Are using the cluster for other workloads and therefore need to customize the installation to avoid conflict with existing cluster resources.

Prerequisites

The following prerequisites are required in your environment to prevent any conflicts with the quickstart:

  • Users have been configured with OpenShift OAuth, backed by OIDC or some other auth method such as htpasswd, as documented.
  • OpenShift cluster and user-workload monitoring is configured, as documented.
  • Grafana is deployed and managed through the Grafana Operator, in the grafana namespace.
    • An example Grafana operand, with all RBAC and resources wired up to User Workload Monitoring, is available in docs/examples/grafana.yaml. It expects that your Grafana Operator installation was namespace scoped, and deployed to the grafana namespace, and that your in-cluster registry is configured.
  • Red Hat OpenShift Dev Spaces is deployed, as documented.
    • A basic CheCluster resource is configured, as in steps 2 and 3 of the above.
  • Red Hat OpenShift AI version 3.2.0 has been deployed from the fast-3.x channel, as documented.
    • A Data Science Cluster has been created that enables at least the Dashboard, KServe, and Llama Stack Operator components, as documented.
    • Note that using Manual approval mode with the startingCSV set to rhods-operator.3.2.0 is recommended to stay on the version tested with this code base.
  • Red Hat Connectivity Link has been deployed from the stable channel, as documented.
    • A Kuadrant resource has been installed in the kuadrant-system namespace, as documented.
  • You have created the openshift-default GatewayClass object for Gateway API in OpenShift, and are able to create Gateway instances using your clusters load balancer and infrastructure configuration. See the documentation for more details about Gateway API in OpenShift.

Installation Steps

  1. Ensure you’re logged into your cluster as a cluster-admin user:
oc whoami
oc get nodes
Copy to Clipboard Toggle word wrap
  1. Install the developer preview release of Models as a Service.

    1. Create a namespace for the developer preview:

      oc create ns maas-api
      
      Copy to Clipboard Toggle word wrap
    2. Run, from the root of the cloned repository, the following and ensure the values look correct for your cluster:

      ./dev-preview/render.sh
      
      Copy to Clipboard Toggle word wrap
    3. Apply the rendered developer preview overlay with the following:

      oc apply -k ./dev-preview
      
      Copy to Clipboard Toggle word wrap
  2. Copy charts/maas-code-assistant/values.yaml to edit it:

cp charts/maas-code-assistant/values.yaml environment.yaml
Copy to Clipboard Toggle word wrap
  1. Edit the file and update the following sections to match your environment:

    1. global.wildcardDomain and global.wildcardCertName

      1. You can recover the proper values by running the following:
      oc get ingresscontroller -n openshift-ingress-operator default -ojsonpath='{.status.domain}{"\n"}'
      oc get ingresscontroller -n openshift-ingress-operator default -ojsonpath='{.spec.defaultCertificate.name}{"\n"}'
      
      Copy to Clipboard Toggle word wrap
    2. grafana.namespace and grafana.selectors

      1. Use the Namespace of your Grafana resource for the Grafana Operator.
      2. Set selectors to match labels on your Grafana instance. For example, if you get the following output:
      oc get grafana grafana -n grafana -ojsonpath='{.metadata.labels}' | jq .
      
      Copy to Clipboard Toggle word wrap

      {
      “app”: “grafana”
      }
      You should set selectors to app: grafana.

    3. If you have deployed the openshift-default GatewayClass, as instructed above, configure it to not be managed by the chart by setting openshift-ai.gatewayClass.create to false.

  2. Update the tiers section to map your desired user/tier mapping for the default MaaS tiers.

    1. For example, if you have users named “bob,” “sue,” and “tom,” and would like them all to be in the enterprise tier, with user “sally” in the premium tier and “frank” in the free tier, use the following value for tiers:
    tiers:
      free:
        users:
          - frank
      premium:
        users:
          - sally
      enterprise:
        users:
          - bob
          - sue
          - tom
    
    Copy to Clipboard Toggle word wrap
  3. Complete any tweaks necessary to the models array to ensure the workloads will place on your GPU-enabled nodes. This may involve changing the tolerations, adjusting the resources, adding the nodeSelector field to each model and configuring it with a valid nodeSelector for the pod template, etc.

  4. Install the quickstart with helm:

helm install maas-code-assistant ./charts/maas-code-assistant -f environment.yaml
Copy to Clipboard Toggle word wrap

Note that, depending on your environment, the openshift-ai-inference Gateway may already be deployed in your cluster, giving you error output such as Error: INSTALLATION FAILED: Unable to continue with install: Gateway "openshift-ai-inference" in namespace "openshift-ingress" exists and cannot be imported into the current release. If this is the case, update your environment.yaml to include openshift-ai.gateway.create set to false.

Red Hat logoGithubredditYoutubeTwitter

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

Red Hat을 사용하는 고객은 신뢰할 수 있는 콘텐츠가 포함된 제품과 서비스를 통해 혁신하고 목표를 달성할 수 있습니다. 최신 업데이트를 확인하세요.

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat은 코드, 문서, 웹 속성에서 문제가 있는 언어를 교체하기 위해 최선을 다하고 있습니다. 자세한 내용은 다음을 참조하세요.Red Hat 블로그.

Red Hat 소개

Red Hat은 기업이 핵심 데이터 센터에서 네트워크 에지에 이르기까지 플랫폼과 환경 전반에서 더 쉽게 작업할 수 있도록 강화된 솔루션을 제공합니다.

Theme

© 2026 Red Hat
맨 위로 이동