Chapter 2. Using Guardrails for AI safety
Use the Guardrails tools to ensure the safety and security of your generative AI applications in production.
2.1. Detecting PII and sensitive data Copy linkLink copied to clipboard!
Protect user privacy by identifying and filtering personally identifiable information (PII) in LLM inputs and outputs using built-in regex detectors or custom detection models.
2.2. Detecting personally identifiable information (PII) by using Guardrails with Llama Stack Copy linkLink copied to clipboard!
The trustyai_fms Orchestrator server is an external provider for Llama Stack that allows you to configure and use the Guardrails Orchestrator and compatible detection models through the Llama Stack API. This implementation of Llama Stack combines Guardrails Orchestrator with a suite of community-developed detectors to provide robust content filtering and safety monitoring. Guardrails execution is independent of the configured vector store and does not require Milvus or pgvector to be enabled.
This example demonstrates how to use the built-in Guardrails Regex Detector to detect personally identifiable information (PII) with Guardrails Orchestrator as Llama Stack safety guardrails, using the LlamaStack Operator to deploy a distribution in your Red Hat OpenShift AI namespace.
Guardrails Orchestrator with Llama Stack is not supported on s390x, as it requires the LlamaStack Operator, which is currently unavailable for this architecture.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.
A cluster administrator has installed the following Operators in OpenShift:
- Red Hat Connectivity Link version 1.1.1 or later.
You must uninstall OpenShift Service Mesh, version 2.6.7-0 or later, from your cluster.
Procedure
Configure your OpenShift AI environment with the following configurations in the
DataScienceCluster. Note that you must manually update thespec.llamastack.managementStatefield toManaged:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a project in your OpenShift AI namespace:
PROJECT_NAME="lls-minimal-example" oc new-project $PROJECT_NAME
PROJECT_NAME="lls-minimal-example" oc new-project $PROJECT_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy the Guardrails Orchestrator with regex detectors by applying the Orchestrator configuration for regex-based PII detection:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the same namespace, create a Llama Stack distribution:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
— After deploying the LlamaStackDistribution CR, a new pod is created in the same namespace. This pod runs the LlamaStack server for your distribution. —
-
Once the Llama Stack server is running, use the
/v1/shieldsendpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII). Open a port-forward to access it locally:
oc -n $PROJECT_NAME port-forward svc/llama-stack 8321:8321
oc -n $PROJECT_NAME port-forward svc/llama-stack 8321:8321Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
/v1/shieldsendpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII):Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the shield was registered:
curl -s http://localhost:8321/v1/shields | jq '.'
curl -s http://localhost:8321/v1/shields | jq '.'Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following output indicates that the shield has been registered successfully:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Once the shield has been registered, verify that it is working by sending a message containing PII to the
/v1/safety/run-shieldendpoint:Email detection example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This should return a response indicating that the email was detected:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Social security number (SSN) detection example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This should return a response indicating that the SSN was detected:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Credit card detection example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This should return a response indicating that the credit card number was detected:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.3. Filtering flagged content by sending requests to the regex detector Copy linkLink copied to clipboard!
You can use the Guardrails Orchestrator API to send requests to the regex detector. The regex detector filters conversations by flagging content that matches specified regular expression patterns.
Prerequisites
You have deployed a Guardrails Orchestrator with the built-in-detector server, such as in the following example:
Example guardrails_orchestrator_auto_cr.yaml CR
Procedure
Send a request to the built-in detector that you configured. The following example sends a request to a regex detector named
regexto flag personally identifying information.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example response
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.4. Securing prompts Copy linkLink copied to clipboard!
Prevent malicious prompt injection attacks by using specialized detectors to identify and block potentially harmful prompts before they reach your model.
2.5. Mitigating Prompt Injection by using a Hugging Face Prompt Injection detector Copy linkLink copied to clipboard!
These instructions build on the previous HAP scenario example and consider two detectors, HAP and Prompt Injection, deployed as part of the guardrailing system.
The instructions focus on the Hugging Face (HF) Prompt Injection detector, outlining two scenarios:
- Using the Prompt Injection detector with a generative large language model (LLM), deployed as part of the Guardrails Orchestrator service and managed by the TrustyAI Operator, to perform analysis of text input or output of an LLM, using the Orchestrator API.
- Perform standalone detections on text samples using an open-source Detector API.
These examples provided contain sample text that some people may find offensive, as the purpose of the detectors is to demonstrate how to filter out offensive, hateful, or malicious content.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- You are familiar with how to configure and deploy the Guardrails Orchestrator service. See Deploying the Guardrails Orchestrator
-
You have the TrustyAI component in your OpenShift AI
DataScienceClusterset toManaged. - You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace, to follow the Orchestrator API example.
Scenario 1: Using a Prompt Injection detector with a generative large language model
Create a new project in Openshift using the CLI:
oc new-project detector-demo
oc new-project detector-demoCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create
service_account.yaml:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply
service_account.yamlto create the service account:oc apply -f service_account.yaml
oc apply -f service_account.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
prompt_injection_detector.yaml. In the following code example, replace <your_rhoai_version> with your OpenShift AI version (for example, v2.25). This feature requires OpenShift AI version 2.25 or later.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply
prompt_injection_detector.yamlto configure a serving runtime, inference service, and route for the Prompt Injection detector you want to incorporate in your Guardrails orchestration service:oc apply -f prompt_injection_detector.yaml
oc apply -f prompt_injection_detector.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create
hap_detector.yaml:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
image: Replace<your_rhoai_version>with your OpenShift AI version (for example,v2.25). This feature requires OpenShift AI version 2.25 or later.
-
Apply
hap_detector.yamlto configure a serving runtime, inference service, and route for the HAP detector:oc apply -f hap_detector.yaml
$ oc apply -f hap_detector.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteFor more information about configuring the HAP detector and deploying a text generation LLM, see the TrustyAI LLM demos.
Add the detector to the
ConfigMapin the Guardrails Orchestrator:Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe built-in detectors have been switched off by setting the
enableBuiltInDetectorsoption tofalse.Use HAP and Prompt Injection detectors to perform detections on lists of messages comprising a conversation and/or completions from a model:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Within the Orchestrator API, you can use these detectors (HAP and Prompt Injection) to:
- Carry out content filtering for a text generation LLM at the input level, output level, or both.
- Perform standalone detections with the Orchestrator API.
The following images are not supported on arm64, s390x, and ppc64le:
-
quay.io/rgeada/llm_downloader:latest -
quay.io/trustyai/modelmesh-minio-examples:latest -
quay.io/trustyai/guardrails-detector-huggingface-runtime:v0.2.0
As a workaround:
- HAP and Prompt Injection models can be downloaded from Hugging Face, stored in S3-compatible storage, and deployed via the OpenShift AI Dashboard.
-
A compatible image for Hugging Face
ServingRuntimeis available in the OpenShift AI Dashboard under Serving Runtime Templates.
Scenario 2: Using a Prompt Injection detector to perform standalone detections
You can use Prompt Injection detectors to perform standalone detection using a Detector API or the Orchestrator API.
Get the route of your detector:
PROMPT_INJECTION_ROUTE=$(oc get routes prompt-injection-detector-route -o jsonpath='{.spec.host}')PROMPT_INJECTION_ROUTE=$(oc get routes prompt-injection-detector-route -o jsonpath='{.spec.host}')Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the health status of your detector:
curl -s http://$PROMPT_INJECTION_ROUTE/health | jq
curl -s http://$PROMPT_INJECTION_ROUTE/health | jqCopy to Clipboard Copied! Toggle word wrap Toggle overflow This command returns `"ok"` if the detector is functioning correctly.
This command returns `"ok"` if the detector is functioning correctly.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Perform detections using your detector:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following output is displayed:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.6. Moderating and safeguarding content Copy linkLink copied to clipboard!
Filter toxic, hateful, or profane content from user inputs and model outputs to maintain safe and appropriate AI interactions.
2.7. Detecting hateful and profane language Copy linkLink copied to clipboard!
The following example demonstrates how to use Guardrails Orchestrator to monitor user inputs to your LLM, specifically to detect and protect against hateful and profane language (HAP). A comparison query without the detector enabled shows the differences in responses when guardrails is disabled versus enabled.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
You have installed the OpenShift CLI (
oc) as described in the appropriate documentation for your cluster:- Installing the OpenShift CLI for OpenShift Container Platform
- Installing the OpenShift CLI for Red Hat OpenShift Service on AWS
- You have deployed the Guardrails Orchestrator and related detectors. For more information, see Deploying the Guardrails Orchestrator
Procedure
Define a
ConfigMapobject in a YAML file to specify the LLM service you wish to guardrail against and the HAP detector service you want to run the guardrails with. For example, create a file namedorchestrator_cm.yamlwith the following content:Example
orchestrator_cm.yamlyamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the configuration to deploy the detector:
oc apply -f orchestrator_cm.yaml -n <TEST_NAMESPACE>
$ oc apply -f orchestrator_cm.yaml -n <TEST_NAMESPACE>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Retrieve the external HTTP route for the orchestrator:
GORCH_ROUTE=$(oc get routes gorch-test -o jsonpath='{.spec.host}' -n <TEST_NAMESPACE>)GORCH_ROUTE=$(oc get routes gorch-test -o jsonpath='{.spec.host}' -n <TEST_NAMESPACE>)Copy to Clipboard Copied! Toggle word wrap Toggle overflow Query the orchestrator’s
api/v2/chat/completions-detectionsendpoint without the HAP detector enabled to generate a response without guardrails:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example response
{"id":"cmpl-f6da55d06ade4792a33d4ae67a07cc38","object":"chat.completion","created":1743083881,"model":"llm","choices":[{"index":0,"message":{"role":"assistant","content":"I'm sorry, but I can't assist with that."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":56,"total_tokens":69,"completion_tokens":13}}{"id":"cmpl-f6da55d06ade4792a33d4ae67a07cc38","object":"chat.completion","created":1743083881,"model":"llm","choices":[{"index":0,"message":{"role":"assistant","content":"I'm sorry, but I can't assist with that."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":56,"total_tokens":69,"completion_tokens":13}}Copy to Clipboard Copied! Toggle word wrap Toggle overflow When HAP detections are not enabled on model inputs and outputs through the Guardrails Orchestrator, the model generates output without flagging unsuitable inputs.
Query the
api/v2/chat/completions-detectionsendpoint of the orchestrator and enable the HAP detector to generate a response with guardrails:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example response
{"id":"086980692dc1431f9c32cd56ba607067","object":"","created":1743084024,"model":"llm","choices":[],"usage":{"prompt_tokens":0,"total_tokens":0,"completion_tokens":0},"detections":{"input":[{"message_index":0,"results":[{"start":0,"end":36,"text":"<explicit_text>, I really hate this stuff","detection":"sequence_classifier","detection_type":"sequence_classification","detector_id":"hap","score":0.9634239077568054}]}]},"warnings":[{"type":"UNSUITABLE_INPUT","message":"Unsuitable input detected. Please check the detected entities on your input and try again with the unsuitable input removed."}]}{"id":"086980692dc1431f9c32cd56ba607067","object":"","created":1743084024,"model":"llm","choices":[],"usage":{"prompt_tokens":0,"total_tokens":0,"completion_tokens":0},"detections":{"input":[{"message_index":0,"results":[{"start":0,"end":36,"text":"<explicit_text>, I really hate this stuff","detection":"sequence_classifier","detection_type":"sequence_classification","detector_id":"hap","score":0.9634239077568054}]}]},"warnings":[{"type":"UNSUITABLE_INPUT","message":"Unsuitable input detected. Please check the detected entities on your input and try again with the unsuitable input removed."}]}Copy to Clipboard Copied! Toggle word wrap Toggle overflow When you enable HAP detections on model inputs and outputs via the Guardrails Orchestrator, unsuitable inputs are clearly flagged and model outputs are not generated.
Optional: You can also enable standalone detections on text by querying the
api/v2/text/detection/contentendpoint:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example response
{"detections":[{"start":0,"end":36,"text":"You <explicit_text>, I really hate this stuff","detection":"sequence_classifier","detection_type":"sequence_classification","detector_id":"hap","score":0.9634239077568054}]}{"detections":[{"start":0,"end":36,"text":"You <explicit_text>, I really hate this stuff","detection":"sequence_classifier","detection_type":"sequence_classification","detector_id":"hap","score":0.9634239077568054}]}Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.8. Enforcing configured safety pipelines for LLM inference by using Guardrails Gateway Copy linkLink copied to clipboard!
The Guardrails Gateway is a sidecar image that you can use with the GuardrailsOrchestrator service. When running your AI application in production, you can use the Guardrails Gateway to enforce a consistent, custom set of safety policies using a preset guardrail pipeline. For example, you can create a preset guardrail pipeline for PII detection and language moderation. You can then send chat completions requests to the preset pipeline endpoints without needing to alter existing inference API calls. It provides the OpenAI v1/chat/completions API and allows you to specify which detectors and endpoints you want to use to access the service.
Prerequisites
- You have configured the Guardrails gateway image.
Procedure
Set up the endpoint for the detectors:
GUARDRAILS_GATEWAY=https://$(oc get routes guardrails-gateway -o jsonpath='{.spec.host}')GUARDRAILS_GATEWAY=https://$(oc get routes guardrails-gateway -o jsonpath='{.spec.host}')Copy to Clipboard Copied! Toggle word wrap Toggle overflow Based on the example configurations provided in Configuring the built-in detector and Guardrails gateway, the available endpoint for the model with Guardrails is
$GUARDRAILS_GATEWAY/pii.Query the model with Guardrails
piiendpoint:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example response
Warning: Unsuitable input detected. Please check the detected entities on your input and try again with the unsuitable input removed. Input Detections: 0) The regex detector flagged the following text: "123-45-6789"
Warning: Unsuitable input detected. Please check the detected entities on your input and try again with the unsuitable input removed. Input Detections: 0) The regex detector flagged the following text: "123-45-6789"Copy to Clipboard Copied! Toggle word wrap Toggle overflow