Home
Learn
AI quickstarts
rh-llm-cpu-serving

Serve a lightweight HR assistant
Copy link

Replace hours spent searching policy documents with higher-value relational work.

Detailed description
Copy link

The Assistant to the HR Representative is a lightweight quickstart designed to give HR Representatives in Financial Services a trusted sounding board for discussions and decisions. Chat with this assistant for quick insights and actionable advice.

This quickstart was designed for environments where GPUs are not available or necessary, making it ideal for lightweight inference use cases, prototyping, or constrained environments. By making the most of vLLM on CPU-based infrastructure, this Assistant to the HR Representative can be deployed to almost any OpenShift AI environment.

This quickstart includes a Helm chart for deploying:

An OpenShift AI Project.
vLLM with CPU support running an instance of TinyLlama.
AnythingLLM, a versatile chat interface, running as a workbench and connected to the vLLM.

Use this project to quickly spin up a minimal vLLM instance and start serving models like TinyLlama on CPU—no GPU required. 🚀

Architecture diagrams
Copy link

Requirements
Copy link

Minimum hardware requirements
Copy link

No GPU needed! 🤖
2 cores
4 Gi
Storage: 5Gi

Minimum software requirements
Copy link

Red Hat OpenShift 4.16.24 or later
Red Hat OpenShift AI 2.16.2 or later
Dependencies for Single-model server:
- Red Hat OpenShift Service Mesh
- Red Hat OpenShift Serverless

Required user permissions
Copy link

Standard user. No elevated cluster permissions required.

Deploy
Copy link

Follow the below steps to deploy and test the HR assistant.

Clone
Copy link

git clone https://github.com/rh-ai-quickstart/llm-cpu-serving.git && \
    cd llm-cpu-serving/

git clone https://github.com/rh-ai-quickstart/llm-cpu-serving.git && \
    cd llm-cpu-serving/

Copy to Clipboard

Toggle word wrap

Create the project
Copy link

PROJECT="hr-assistant"

oc new-project ${PROJECT}

PROJECT="hr-assistant"

oc new-project ${PROJECT}

Copy to Clipboard

Toggle word wrap

Install with Helm
Copy link

helm install ${PROJECT} helm/ --namespace  ${PROJECT}

helm install ${PROJECT} helm/ --namespace  ${PROJECT}

Copy to Clipboard

Toggle word wrap

Wait for pods
Copy link

oc -n ${PROJECT}  get pods -w

oc -n ${PROJECT}  get pods -w

Copy to Clipboard

Toggle word wrap

(Output)
NAME                                         READY   STATUS    RESTARTS   AGE
anythingllm-0                                 3/3     Running     0          76s
anythingllm-seed-lchf6                        0/1     Completed   0          76s
tinyllama-1b-cpu-predictor-544bdf75f9-x9fwh   2/2     Running     0          75s

(Output)
NAME                                         READY   STATUS    RESTARTS   AGE
anythingllm-0                                 3/3     Running     0          76s
anythingllm-seed-lchf6                        0/1     Completed   0          76s
tinyllama-1b-cpu-predictor-544bdf75f9-x9fwh   2/2     Running     0          75s

Copy to Clipboard

Toggle word wrap

Test
Copy link

You can get the OpenShift AI Dashboard URL by:

oc get routes rhods-dashboard -n redhat-ods-applications

oc get routes rhods-dashboard -n redhat-ods-applications

Copy to Clipboard

Toggle word wrap

Once inside the dashboard, navigate to Data Science Projects -> tinyllama-cpu-demo (or what you called your ${PROJECT} if you changed from default).

OpenShift AI Projects

Inside the project you can see Workbenches, open up the one for AnythingLLM.

OpenShift AI Projects

Finally, click on the Assistant to the HR Representative Workspace that's pre-created for you and you can start chatting with your Assistant to the HR Representative! :)
Try for example asking it:

Hi, one of our employees is going to get a raise, what do I need to keep in mind for this?

Hi, one of our employees is going to get a raise, what do I need to keep in mind for this?

Copy to Clipboard

Toggle word wrap

It will provide you a reply and some citations related to the question.

AnythingLLM

Delete
Copy link

helm uninstall ${PROJECT} --namespace ${PROJECT}

helm uninstall ${PROJECT} --namespace ${PROJECT}

Copy to Clipboard

Toggle word wrap

References
Copy link

The runtime is built from vLLM CPU
Runtime image is pushed to quay.io/repository/rh-aiservices-bu/vllm-cpu-openai-ubi9
Code for Runtime image and deployment can be found on github.com/rh-aiservices-bu/llm-on-openshift

Serve a lightweight HR assistant

Serve a lightweight HR assistant
Copy link

Detailed description
Copy link

Architecture diagrams
Copy link

Requirements
Copy link

Minimum hardware requirements
Copy link

Recommended hardware requirements
Copy link

Minimum software requirements
Copy link

Required user permissions
Copy link

Deploy
Copy link

Clone
Copy link

Create the project
Copy link

Install with Helm
Copy link

Wait for pods
Copy link

Test
Copy link

Delete
Copy link

References
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Serve a lightweight HR assistant

Serve a lightweight HR assistantCopy linkLink copied!

Detailed descriptionCopy linkLink copied!

Architecture diagramsCopy linkLink copied!

RequirementsCopy linkLink copied!

Minimum hardware requirementsCopy linkLink copied!

Recommended hardware requirementsCopy linkLink copied!

Minimum software requirementsCopy linkLink copied!

Required user permissionsCopy linkLink copied!

DeployCopy linkLink copied!

CloneCopy linkLink copied!

Create the projectCopy linkLink copied!

Install with HelmCopy linkLink copied!

Wait for podsCopy linkLink copied!

TestCopy linkLink copied!

DeleteCopy linkLink copied!

ReferencesCopy linkLink copied!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Serve a lightweight HR assistant
Copy link

Detailed description
Copy link

Architecture diagrams
Copy link

Requirements
Copy link

Minimum hardware requirements
Copy link

Recommended hardware requirements
Copy link

Minimum software requirements
Copy link

Required user permissions
Copy link

Deploy
Copy link

Clone
Copy link

Create the project
Copy link

Install with Helm
Copy link

Wait for pods
Copy link

Test
Copy link

Delete
Copy link

References
Copy link