Chapter 3. Deploying NeMo Guardrails


Important

NeMo Guardrails is currently available in Red Hat OpenShift AI as a Technology Preview feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

You can use NVIDIA NeMo Guardrails to add guardrails and safety controls to your deployed models in Red Hat OpenShift AI. With the NeMo Guardrails framework, you can control the input and output of large language models by defining rails for sensitive data detection, content filtering, and custom validation rules.

Prerequisites

  • You have installed Red Hat OpenShift AI.
  • You have logged in to OpenShift AI.
  • You have deployed a model on the model-serving platform that you want to add guardrails to.
  • You have cluster administrator permissions or sufficient permissions to create service accounts, secrets, and custom resources in your project namespace.

3.1. About NeMo Guardrails

You can use NeMo Guardrails to apply guardrails to your large language model (LLM) in Red Hat OpenShift AI. With NeMo Guardrails, you can add guardrails between the application code and the LLM. The TrustyAI Service Operator integrates NeMo Guardrails so that you can safeguard your LLM-based applications on Red Hat OpenShift.

The following table compares NeMo Guardrails and the TrustyAI implementation of the FMS-Guardrails Orchestrator:

Expand
Table 3.1. Comparison of Guardrail Architectures
FeatureFMS Guardrails ArchitectureNeMo Guardrails Architecture

Central Component

Guardrails Orchestrator

NeMo Guardrails Server

Deployment Resource

Guardrails CR

NeMo-Guardrails CR

Detection Mechanism

Built-in detectors that are external to the Orchestrator

Custom Python functions as internal detectors (using the @action decorator) that execute within the NeMo server pod and external, built-in detectors

Operational Flow

Orchestrator watches and calls detector services and detection flows are fixed in the ConfigMap

NeMo server coordinates internal logic and external calls and Colang can be used for programmable detection flow

Shared Operator

Managed by the TrustyAI Operator

Managed by the TrustyAI Operator

Namespace Location

Deployed within Model Namespaces

Deployed within Model Namespaces

Inference Path

User Orchestrator vLLM Model

User NeMo Server vLLM Model

Language Stack

Rust-based (Tokio)

Python-based (FastAPI)

3.2. Deploying the NeMo Guardrails service

Deploy NVIDIA NeMo Guardrails to add guardrails and safety controls to your deployed models in Red Hat OpenShift AI.

NeMo Guardrails provides a framework for controlling the input and output of large language models, enabling you to define guardrails for sensitive data detection, content filtering, and custom validation rules. For more information, see NVIDIA NeMo Guardrails Configuration Guide.

Prerequisites

  • You have installed Red Hat OpenShift AI.
  • You have logged in to OpenShift AI.
  • You have deployed a model on the model-serving platform that you want to add guardrails to.
  • You have cluster administrator permissions or sufficient permissions to create service accounts, secrets, and custom resources in your project namespace.
  • You know the internal service URL for your deployed model predictor.

Procedure

  1. Create a service account for the NeMo Guardrails service. For example, create a yaml file named nemo-guardrails-service-account.yaml as follows:

    Example nemo-guardrails-service-account.yaml file

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: nemo-guardrails-service-account
    Copy to Clipboard Toggle word wrap

  2. Apply the nemo-guardrails-service-account.yaml file:

    $ oc apply -f nemo-guardrails-service-account.yaml
    Copy to Clipboard Toggle word wrap
  3. Create a role binding for the NeMo Guardrails service account. For example, create a yaml file named role-binding.yaml as follows:

    Example role-binding.yaml file

    kind: RoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: nemo-guardrails-service-account-view
    subjects:
      - kind: ServiceAccount
        name: nemo-guardrails-service-account
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: view
    Copy to Clipboard Toggle word wrap

  4. Apply the role-binding.yaml file:

    $ oc apply -f role-binding.yaml
    Copy to Clipboard Toggle word wrap
  5. Create a secret containing an API token for the service account:

    $ oc create secret generic api-token-secret \
      --from-literal=token=$(oc create token nemo-guardrails-service-account --duration=8760h)
    Copy to Clipboard Toggle word wrap
  6. Create a ConfigMap containing the NeMo Guardrails configuration. For example, create a file named nemo-config.yaml as follows:

    Example nemo-config.yaml file

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: nemo-config
    data:
      config.yaml: |
        models:
          - type: main
            engine: openai
            parameters:
              openai_api_base: "<model_predictor_url>"
              model_name: "<model_name>"
        rails:
          config:
            sensitive_data_detection:
              input:
                entities:
                  - EMAIL_ADDRESS
              output:
                entities:
                  - PERSON
          input:
            flows:
              - detect sensitive data on input
              - check message length
              - check forbidden words
          output:
            flows:
              - detect sensitive data on output
      rails.co: |
        define flow check message length
          \$length_result = execute check_message_length
          if \$length_result == "blocked_too_long"
            bot inform message too long
            stop
          if \$length_result == "warning_long"
            bot warn message long
    
        define bot inform message too long
          "Please keep your message under 100 words for better assistance."
    
        define bot warn message long
          "That's quite detailed! I'll help as best I can."
    
        define flow check forbidden words
          \$forbidden_result = execute check_forbidden_words
          if \$forbidden_result != "allowed"
            bot inform forbidden content
            stop
    
        define bot inform forbidden content
          "I can't help with that type of request. Please ask something else."
    
    
      actions.py: |
        from typing import Optional
        from nemoguardrails.actions import action
    
        @action(is_system_action=True)
        async def check_message_length(context: Optional[dict] = None) -> str:
            """Check if user message is within acceptable length limits."""
            user_message = context.get("user_message", "")
            word_count = len(user_message.split())
            MAX_WORDS = 100
            if word_count > MAX_WORDS:
                return "blocked_too_long"
            elif word_count > MAX_WORDS * 0.8:
                return "warning_long"
            return "allowed"
    
        @action(is_system_action=True)
        async def check_forbidden_words(context: Optional[dict] = None) -> str:
            """Check for forbidden words or topics."""
            user_message = context.get("user_message", "").lower()
            forbidden_topics = {
                "security": ["password", "hack", "exploit", "vulnerability"],
                "inappropriate": ["violence", "illegal", "harmful"],
                "competitors": ["chatgpt", "openai", "claude", "anthropic"],
            }
            for category, words in forbidden_topics.items():
                for word in words:
                    if word in user_message:
                        return f"blocked_{category}_{word}"
            return "allowed"
    Copy to Clipboard Toggle word wrap

    • <model_predictor_url>: The internal service URL for your model predictor (for example, https://phi3-predictor.model-namespace.svc.cluster.local:8443/v1)
    • <model_name>: The name of your deployed model (for example, phi3)
    • data.config.yaml.models.type.engine: To guardrail a vLLM model, you must set the first model’s engine to openai, as shown in the configuration above.
    • rails.co: Colang flows that implement the guardrails.
    • actions.py: The custom actions implemented in Python. For more information, see Python Actions.
  7. Apply the nemo-config.yaml file:

    $ oc apply -f nemo-config.yaml
    Copy to Clipboard Toggle word wrap
  8. Create the NeMo Guardrails custom resource (CR). For example, create a file named nemo-guardrails-cr.yaml:

    Example nemo-guardrails-cr.yaml file

    apiVersion: trustyai.opendatahub.io/v1alpha1
    kind: NemoGuardrails
    metadata:
      name: nemo-guardrails-cr
      annotations:
        security.opendatahub.io/enable-auth: 'true'
    spec:
      nemoConfigs:
        - name: nemo-config
          configMaps:
            - nemo-config
      env:
        - name: "OPENAI_API_KEY"
          valueFrom:
            secretKeyRef:
              name: "api-token-secret"
              key: "token"
    Copy to Clipboard Toggle word wrap

  9. Deploy the NeMo Guardrails CR:

    $ oc apply -f nemo-guardrails-cr.yaml
    Copy to Clipboard Toggle word wrap

Verification

  1. Retrieve the NeMo Guardrails route:

    $ GUARDRAILS_ROUTE=https://$(oc get routes/nemo-guardrails -o jsonpath={.status.ingress[0].host})
    Copy to Clipboard Toggle word wrap
  2. Test the NeMo Guardrails service with a safe request:

    $ curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $(oc whoami -t)" \
      -d {"messages":[{"role":"user","content":"Hi!"}]}
    Copy to Clipboard Toggle word wrap

    Example output

    {"messages":[{"role":"assistant","content":"Hello there! I'm so glad you're here. What can I help you with today?"}]}
    Copy to Clipboard Toggle word wrap

  3. Test the guardrails by sending a request with forbidden input and requests, such as the following examples:

    • Input containing the word "violence":

      $ curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $(oc whoami -t)" \
        -d {"messages":[{"role":"user","content":"I yearn for violence"}]}
      Copy to Clipboard Toggle word wrap

      Example output

      {"messages":[{"role":"assistant","content":"I can't help with that type of request. Please ask something else."}]}
      Copy to Clipboard Toggle word wrap

    • Request for a name output:

      $ curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $(oc whoami -t)" \
        -d {"messages":[{"role":"user","content": "In just two words, provide a typical American first and last name."}]}
      Copy to Clipboard Toggle word wrap

      Example output

      {"messages":[{"role":"assistant","content":"I don't know the answer to that."}]}
      Copy to Clipboard Toggle word wrap

      Note

      The output without guardrails is "John Smith".

    • Input that is too long:

      $ curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $(oc whoami -t)" \
        -d {"messages":[{"role":"user","content":"<long_message>"}]}
      Copy to Clipboard Toggle word wrap

      Replace <long_message> with a message containing more than 100 words.

      Example output

      {"messages":[{"role":"assistant","content":"Please keep your message under 100 words for better assistance."}]}
      Copy to Clipboard Toggle word wrap

Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top