Chapter 3. Deploying NeMo Guardrails
NeMo Guardrails is currently available in Red Hat OpenShift AI as a Technology Preview feature. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can use NVIDIA NeMo Guardrails to add guardrails and safety controls to your deployed models in Red Hat OpenShift AI. With the NeMo Guardrails framework, you can control the input and output of large language models by defining rails for sensitive data detection, content filtering, and custom validation rules.
Prerequisites
- You have installed Red Hat OpenShift AI.
- You have logged in to OpenShift AI.
- You have deployed a model on the model-serving platform that you want to add guardrails to.
- You have cluster administrator permissions or sufficient permissions to create service accounts, secrets, and custom resources in your project namespace.
3.1. About NeMo Guardrails Copy linkLink copied to clipboard!
You can use NeMo Guardrails to apply guardrails to your large language model (LLM) in Red Hat OpenShift AI. With NeMo Guardrails, you can add guardrails between the application code and the LLM. The TrustyAI Service Operator integrates NeMo Guardrails so that you can safeguard your LLM-based applications on Red Hat OpenShift.
The following table compares NeMo Guardrails and the TrustyAI implementation of the FMS-Guardrails Orchestrator:
| Feature | FMS Guardrails Architecture | NeMo Guardrails Architecture |
|---|---|---|
| Central Component | Guardrails Orchestrator | NeMo Guardrails Server |
| Deployment Resource | Guardrails CR | NeMo-Guardrails CR |
| Detection Mechanism | Built-in detectors that are external to the Orchestrator | Custom Python functions as internal detectors (using the @action decorator) that execute within the NeMo server pod and external, built-in detectors |
| Operational Flow | Orchestrator watches and calls detector services and detection flows are fixed in the ConfigMap | NeMo server coordinates internal logic and external calls and Colang can be used for programmable detection flow |
| Shared Operator | Managed by the TrustyAI Operator | Managed by the TrustyAI Operator |
| Namespace Location | Deployed within Model Namespaces | Deployed within Model Namespaces |
| Inference Path |
User |
User |
| Language Stack | Rust-based (Tokio) | Python-based (FastAPI) |
3.2. Deploying the NeMo Guardrails service Copy linkLink copied to clipboard!
Deploy NVIDIA NeMo Guardrails to add guardrails and safety controls to your deployed models in Red Hat OpenShift AI.
NeMo Guardrails provides a framework for controlling the input and output of large language models, enabling you to define guardrails for sensitive data detection, content filtering, and custom validation rules. For more information, see NVIDIA NeMo Guardrails Configuration Guide.
Prerequisites
- You have installed Red Hat OpenShift AI.
- You have logged in to OpenShift AI.
- You have deployed a model on the model-serving platform that you want to add guardrails to.
- You have cluster administrator permissions or sufficient permissions to create service accounts, secrets, and custom resources in your project namespace.
- You know the internal service URL for your deployed model predictor.
Procedure
Create a service account for the NeMo Guardrails service. For example, create a yaml file named
nemo-guardrails-service-account.yamlas follows:Example nemo-guardrails-service-account.yaml file
apiVersion: v1 kind: ServiceAccount metadata: name: nemo-guardrails-service-account
apiVersion: v1 kind: ServiceAccount metadata: name: nemo-guardrails-service-accountCopy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
nemo-guardrails-service-account.yamlfile:oc apply -f nemo-guardrails-service-account.yaml
$ oc apply -f nemo-guardrails-service-account.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a role binding for the NeMo Guardrails service account. For example, create a yaml file named
role-binding.yamlas follows:Example role-binding.yaml file
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
role-binding.yamlfile:oc apply -f role-binding.yaml
$ oc apply -f role-binding.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a secret containing an API token for the service account:
oc create secret generic api-token-secret \ --from-literal=token=$(oc create token nemo-guardrails-service-account --duration=8760h)
$ oc create secret generic api-token-secret \ --from-literal=token=$(oc create token nemo-guardrails-service-account --duration=8760h)Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a ConfigMap containing the NeMo Guardrails configuration. For example, create a file named
nemo-config.yamlas follows:Example nemo-config.yaml file
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
<model_predictor_url>: The internal service URL for your model predictor (for example,https://phi3-predictor.model-namespace.svc.cluster.local:8443/v1) -
<model_name>: The name of your deployed model (for example,phi3) -
data.config.yaml.models.type.engine: To guardrail a vLLM model, you must set the first model’senginetoopenai, as shown in the configuration above. -
rails.co: Colang flows that implement the guardrails. -
actions.py: The custom actions implemented in Python. For more information, see Python Actions.
-
Apply the
nemo-config.yamlfile:oc apply -f nemo-config.yaml
$ oc apply -f nemo-config.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the NeMo Guardrails custom resource (CR). For example, create a file named
nemo-guardrails-cr.yaml:Example nemo-guardrails-cr.yaml file
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy the NeMo Guardrails CR:
oc apply -f nemo-guardrails-cr.yaml
$ oc apply -f nemo-guardrails-cr.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Retrieve the NeMo Guardrails route:
GUARDRAILS_ROUTE=https://$(oc get routes/nemo-guardrails -o jsonpath={.status.ingress[0].host})$ GUARDRAILS_ROUTE=https://$(oc get routes/nemo-guardrails -o jsonpath={.status.ingress[0].host})Copy to Clipboard Copied! Toggle word wrap Toggle overflow Test the NeMo Guardrails service with a safe request:
curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $(oc whoami -t)" \ -d {"messages":[{"role":"user","content":"Hi!"}]}$ curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $(oc whoami -t)" \ -d {"messages":[{"role":"user","content":"Hi!"}]}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"messages":[{"role":"assistant","content":"Hello there! I'm so glad you're here. What can I help you with today?"}]}{"messages":[{"role":"assistant","content":"Hello there! I'm so glad you're here. What can I help you with today?"}]}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Test the guardrails by sending a request with forbidden input and requests, such as the following examples:
Input containing the word "violence":
curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $(oc whoami -t)" \ -d {"messages":[{"role":"user","content":"I yearn for violence"}]}$ curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $(oc whoami -t)" \ -d {"messages":[{"role":"user","content":"I yearn for violence"}]}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"messages":[{"role":"assistant","content":"I can't help with that type of request. Please ask something else."}]}{"messages":[{"role":"assistant","content":"I can't help with that type of request. Please ask something else."}]}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Request for a name output:
curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $(oc whoami -t)" \ -d {"messages":[{"role":"user","content": "In just two words, provide a typical American first and last name."}]}$ curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $(oc whoami -t)" \ -d {"messages":[{"role":"user","content": "In just two words, provide a typical American first and last name."}]}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"messages":[{"role":"assistant","content":"I don't know the answer to that."}]}{"messages":[{"role":"assistant","content":"I don't know the answer to that."}]}Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe output without guardrails is "John Smith".
Input that is too long:
curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $(oc whoami -t)" \ -d {"messages":[{"role":"user","content":"<long_message>"}]}$ curl -k -X POST $GUARDRAILS_ROUTE/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $(oc whoami -t)" \ -d {"messages":[{"role":"user","content":"<long_message>"}]}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
<long_message>with a message containing more than 100 words.Example output
{"messages":[{"role":"assistant","content":"Please keep your message under 100 words for better assistance."}]}{"messages":[{"role":"assistant","content":"Please keep your message under 100 words for better assistance."}]}Copy to Clipboard Copied! Toggle word wrap Toggle overflow