Inicio
Learn
AI quickstarts
rh-maas-code-assistant

Accelerate enterprise software development with NVIDIA and MaaS
Copy link

Optimize private app development using NVIDIA Nemotron models through Models-as-a-Service on your own multi-tenant infrastructure in Red Hat AI.

Table of contents
Copy link

Detailed description
- Architecture diagrams
Requirements
Deploy
References
Advanced Deployment
- Prerequisites
- Installation Steps
Tags

Detailed description
Copy link

Developing software with speed and efficiency is a competitive necessity. Developers are often overwhelmed and slowed down by repetitive code, complicated debugging and testing, and the constant need to learn new technologies. AI-powered coding assistance can help, but how do you leverage it securely and cost-effectively?

For organizations restricted by strict data privacy requirements, regulations, or specific performance needs, public AI hosted services often are not an option. As your usage expands, you also need to consider how to keep things as cost-efficient as possible. Models as a Service (MaaS) solves this by enabling centralized IT teams to host and manage private models that remote teams can consume easily and securely. This keeps proprietary data within the organization’s boundaries while providing developers access to the generative AI technology they need. By providing access to the models via API tokens, administrators can also implement specific rate limits and quotas. This approach doesn’t just simplify access and usage, it allows organizations to monitor metrics, forecast capacity and compute needs, and manage chargebacks with precision.

This quickstart demonstrates how you can easily deploy a private AI code assistant powered by NVIDIA Nemotron models and delivered through Red Hat AI's integrated Models as a Service (MaaS) offering. Developers access the assistant through OpenShift DevSpaces, a containerized cloud-native IDE included in OpenShift.

Architecture diagrams
Copy link

Code Assistant w/ MaaS Architecture

This diagram illustrates a models-as-a-service architecture on Red Hat AI including the model deployments in addition to the code assistant application with OpenShift DevSpaces. For more details click here.

Layer/Component	Technology	Purpose/Description
Orchestration	Red Hat AI Enterprise	Container orchestration and comprehensive AI platform
Inference	vLLM and llm-d	High performance inference engine for Gen AI model deployment and kubernetes-native distributed inference capabilities with llm-d
LLM	nemotron-3-nano-30b-a3b-fp8	A quantized 30B-parameter hybrid Mamba-Transformer MoE model with a 1M-token context window, designed for efficient reasoning, chat, and agentic AI applications
Models-as-a-Service	Red Hat AI Enterprise	Integrated LLM governance layer that provides rate-limited model access with usage tracking and chargeback across teams
GPU Acceleration	NVIDIA GPU Operator	Enables GPUs and manages drivers, DCGM, container toolkit, and MIG capabilities for GPU acceleration
Development Environment	OpenShift DevSpaces	Provides IDE instances for development teams to develop and deploy all on the same cluster
Observability	Prometheus Operator	Monitors model inference metrics and GPU telemetry
Dashboard	Grafana	Metrics scraped from Prometheus are then surfaced and shown visually in custom Grafana dashboards

Requirements
Copy link

Minimum hardware requirements
Copy link

One NVIDIA GPU node with at least 48GB VRAM for Nemotron model
One NVIDIA GPU node with at least 48GB VRAM for gpt-oss model

Note: Models in this quickstart were tested with 2 L40S GPU instances on AWS (instance type g6e.2xlarge).

Minimum software requirements
Copy link

Red Hat OpenShift 4.20
Helm CLI
OpenShift Client CLI
Bash shell available in PATH
sed available in PATH (works with macOS/POSIX-only as well as common GNU versions)

Required user permissions
Copy link

Regular user permissions for usage of Models-as-a-Service enabled endpoint, access to DevSpaces workspace, and access to Grafana dashboard for viewing usage data.
Cluster Admin access needed for any changes to model deployments or MaaS configurations.

Deploy
Copy link

The following instructions will easily deploy the quickstart to your Red Hat AI environment using an auto-pilot script-based installation. This will configure the necessary prerequisites for your environment and wire everything together, removing the need for additional configuration.

Please see the advanced deployment section for details on setting up your own prerequisites and deploying the quickstart with more control.

Prerequisites
Copy link

OpenShift cluster (specific version is specified in the software requirements section)
- Optional (recommended): trusted certificates managed for the OpenShift Router, as documented
A default StorageClass needs to be configured. If your cluster is on a cloud provider, this is probably available out of the box. If you're on bare metal or some hypervisor environments, you may need to install additional operators to enable a default StorageClass. See the documentation for OpenShift Data Foundation or the LVM Storage Operator documentation for installation on bare metal
OpenShift cluster has GPUs available
The NVIDIA GPU Operator is installed and configured with a ClusterPolicy (or other API) to configure the driver and make the resources available to Kubernetes to schedule
You do not have other workloads or configurations in the cluster, meaning:
- An identity provider is not deployed or configured
- Red Hat OpenShift AI is not installed
- Red Hat Connectivity Link is not deployed or configured
- Red Hat OpenShift Dev Spaces is not deployed

Installation Steps
Copy link

git clone quickstart repository

git clone https://github.com/rh-ai-quickstart/maas-code-assistant.git

cd into the directory

cd maas-code-assistant

Ensure you’re logged into your cluster as a cluster-admin user, such as kube:admin or system:admin:

oc whoami

Run all-in-one.sh. Enter passwords for the admin and user accounts when prompted (these will be saved in the .env file after the first run of the script, and you won't be prompted again).

./all-in-one.sh

[!NOTE] This installation will leave the kubeadmin user in your cluster, prompting you to select a source to log in from. The rhbk option added to this menu is required to use the users and passwords specified above, and to be able to use MaaS models. If you would like to remove the prompt to select an identity provider and have it default to the Red Hat build of Keycloak, you can edit environment.yaml.tpl and set keycloak.removeKubeAdmin to true before running the script.

Delete
Copy link

To remove the core quickstart components (models, Dev Spaces workspaces, etc.) run the following:

helm uninstall maas-code-assistant

To clean up the dependencies, such as OpenShift AI, follow their documented uninstallation procedures by removing their Operands first, allowing the operators to reconcile and complete removal, before uninstalling the operators themselves.

References
Copy link

vLLM: The High-Throughput and Memory-Efficient inference and serving engine for LLMs.
llm-d: a Kubernetes-native high-performance distributed LLM inference framework.
Red Hat OpenShift DevSpaces: a container-based, in-browser development environment offered by Red Hat that facilitates cloud-native development directly within the OpenShift ecosystem. Included within the OpenShift product offering.
NVIDIA Nemotron: a family of open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents.
NVIDIA GPU Operator: uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU.

Advanced Deployment
Copy link

This advanced deployment option will allow you to control the deployment of all prerequisites separately and tailor it to your specific environment.

Use this deployment path if you:

Have a configured cluster with some or all of the prerequisites already deployed.
Prefer a different configuration path than the defaults set in the quickstart repository installation script.
Are using the cluster for other workloads and therefore need to customize the installation to avoid conflict with existing cluster resources.

Prerequisites
Copy link

The following prerequisites are required in your environment to prevent any conflicts with the quickstart:

Users have been configured with OpenShift OAuth, backed by OIDC or some other auth method such as htpasswd, as documented.
OpenShift cluster and user-workload monitoring is configured, as documented.
Grafana is deployed and managed through the Grafana Operator, in the grafana namespace.
- An example Grafana operand, with all RBAC and resources wired up to User Workload Monitoring, is available in docs/examples/grafana.yaml. It expects that your Grafana Operator installation was namespace scoped, and deployed to the grafana namespace, and that your in-cluster registry is configured as documented. You can configure it differently to not depend on the registry.
Red Hat OpenShift Dev Spaces is deployed, as documented.
- A basic CheCluster resource is configured, as in steps 2 and 3 of the above.
The cert-manager Operator for Red Hat OpenShift has been deployed as documented.
The Leader Worker Set Operator has been deployed, as documented.
Red Hat OpenShift AI version 3.3.0 has been deployed from the fast-3.x, stable-3.x, or stable-3.3 channels, as documented.
- A Data Science Cluster has been created that enables at least the Dashboard, KServe, and Llama Stack Operator components, as documented.
- Note that using Manual approval mode with the startingCSV set to rhods-operator.3.3.0 is recommended to stay on the version tested with this code base.

Red Hat Connectivity Link has been deployed from the stable channel, as documented.

A Kuadrant resource has been installed in the kuadrant-system namespace, as documented.

The Authorino resource that gets created from this Kuadrant instance has been modified with the following to enable TLS on the Authorino endpoint:

oc annotate service -n kuadrant-system authorino-authorino-authorization service.beta.openshift.io/serving-cert-secret-name=authorino-server-cert --overwrite
oc patch authorino -n kuadrant-system authorino --type=merge --patch '{"spec": {"listener": {"tls": {"enabled": true, "certSecretRef": {"name": "authorino-server-cert"}}}}}'

You have created the openshift-default GatewayClass object for Gateway API in OpenShift, and are able to create Gateway instances using your cluster's load balancer and infrastructure configuration. See the documentation for more details about Gateway API in OpenShift.
You have created the maas-default-gateway Gateway object in the openshift-ingress namespace using an infrastructure configuration that is supported for your environment and it shows that it is programmed, when verified as documented. It additionally needs the opendatahub.io/managed: "false" label and the opendadatahub.io/managed: "false" and security.opendatahub.io/authorino-tls-bootstrap: "true" annotations set. Without these, policy enforcement will not work as expected.
- An example of some possible Gateway configurations is available as a Helm template in this repository, at charts/dependency-operators/files/openshift-ai/gateway.yaml. You can use this template as the basis of a custom manifest by removing the templating syntax and configuring it to suit your environment.

Installation Steps
Copy link

Ensure you’re logged into your cluster as a cluster-admin user:

oc whoami
oc get nodes

Copy charts/maas-code-assistant/values.yaml to edit it:

cp charts/maas-code-assistant/values.yaml environment.yaml

Edit the file and update the following sections to match your environment:
1. global.wildcardDomain and global.wildcardCertName
  1. You can recover the proper values by running the following:
```
oc get ingresscontroller -n openshift-ingress-operator default -ojsonpath='{.status.domain}{"\n"}'
oc get ingresscontroller -n openshift-ingress-operator default -ojsonpath='{.spec.defaultCertificate.name}{"\n"}'
```
2. If you are on a bare metal or non-cloud hypervisor environment, your integrated image registry might be disabled. If it is, update global.toolsImage to refer to a container image that at least contains oc.
  1. You can get one such image for your cluster by running the following:
```
oc adm release info --image-for=tools
```
3. grafana.namespace and grafana.selectors
  1. Use the Namespace of your Grafana resource for the Grafana Operator.
  2. Set selectors to match labels on your Grafana instance. For example, if you get the following output:
```
oc get grafana grafana -n grafana -ojsonpath='{.metadata.labels}' | jq .
```
  {
  “app”: “grafana”
  }
  You should set selectors to app: grafana.
Update the tiers section to map your desired user/tier mapping for the default MaaS tiers.
1. For example, if you have users named “bob,” “sue,” and “tom,” and would like them all to be in the enterprise tier, with user “sally” in the premium tier and “frank” in the free tier, use the following value for tiers:
```
tiers:
  free:
    users:
      - frank
  premium:
    users:
      - sally
  enterprise:
    users:
      - bob
      - sue
      - tom
```
1. If you would like to change the request rates and token rates as well, feel free to do so.
Complete any tweaks necessary to the models array to ensure the workloads will place on your GPU-enabled nodes. This may involve changing the tolerations, adjusting the resources, adding the nodeSelector field to each model and configuring it with a valid nodeSelector for the pod template, etc.
Install the quickstart with helm:

helm install maas-code-assistant ./charts/maas-code-assistant -f environment.yaml

Accelerate enterprise software development with NVIDIA and MaaS

Accelerate enterprise software development with NVIDIA and MaaS
Copy link

Table of contents
Copy link

Detailed description
Copy link

Architecture diagrams
Copy link

Requirements
Copy link

Minimum hardware requirements
Copy link

Minimum software requirements
Copy link

Required user permissions
Copy link

Deploy
Copy link

Prerequisites
Copy link

Installation Steps
Copy link

Delete
Copy link

References
Copy link

Advanced Deployment
Copy link

Prerequisites
Copy link

Installation Steps
Copy link

Aprender

Pruebe, compre y venda

Comunidades

Acerca de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de la documentación de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Accelerate enterprise software development with NVIDIA and MaaS

Accelerate enterprise software development with NVIDIA and MaaSCopy linkLink copied!

Table of contentsCopy linkLink copied!

Detailed descriptionCopy linkLink copied!

Architecture diagramsCopy linkLink copied!

RequirementsCopy linkLink copied!

Minimum hardware requirementsCopy linkLink copied!

Minimum software requirementsCopy linkLink copied!

Required user permissionsCopy linkLink copied!

DeployCopy linkLink copied!

PrerequisitesCopy linkLink copied!

Installation StepsCopy linkLink copied!

DeleteCopy linkLink copied!

ReferencesCopy linkLink copied!

Advanced DeploymentCopy linkLink copied!

PrerequisitesCopy linkLink copied!

Installation StepsCopy linkLink copied!

Aprender

Pruebe, compre y venda

Comunidades

Acerca de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de la documentación de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Accelerate enterprise software development with NVIDIA and MaaS
Copy link

Table of contents
Copy link

Detailed description
Copy link

Architecture diagrams
Copy link

Requirements
Copy link

Minimum hardware requirements
Copy link

Minimum software requirements
Copy link

Required user permissions
Copy link

Deploy
Copy link

Prerequisites
Copy link

Installation Steps
Copy link

Delete
Copy link

References
Copy link

Advanced Deployment
Copy link

Prerequisites
Copy link

Installation Steps
Copy link