このコンテンツは選択した言語では利用できません。
Chapter 4. Serving and chatting with the models
To interact with various models on Red Hat Enterprise Linux AI you must serve the model, which hosts it on a server, then you can chat with the models.
4.1. Serving the model
To interact with the models, you must first activate the model in a machine through serving. The ilab model serve
commands starts a vLLM server that allows you to chat with the model.
Prerequisites
- You installed RHEL AI with the bootable container image.
- You initialized InstructLab.
- You installed your preferred Granite LLMs.
- You have root user access on your machine.
Procedure
If you do not specify a model, you can serve the default model,
granite-7b-redhat-lab
, by running the following command:$ ilab model serve
To serve a specific model, run the following command
$ ilab model serve --model-path <model-path>
Example command
$ ilab model serve --model-path ~/.cache/instructlab/models/granite-7b-code-instruct
Example output of when the model is served and ready
INFO 2024-03-02 02:21:11,352 lab.py:201 Using model 'models/granite-7b-code-instruct' with -1 gpu-layers and 4096 max context size. Starting server process After application startup complete see http://127.0.0.1:8000/docs for API. Press CTRL+C to shut down the server.
4.1.1. Optional: Running ilab model serve
as a service
You can set up a systemd
service so that the ilab model serve
command runs as a running service. The systemd
service runs the ilab model serve
command in the background and restarts if it crashes or fails. You can configure the service to start upon system boot.
Prerequisites
- You installed the Red Hat Enterprise Linux AI image on bare metal.
- You initialized InstructLab
- You downloaded your preferred Granite LLMs.
- You have root user access on your machine.
Procedure.
Create a directory for your
systemd
user service by running the following command:$ mkdir -p $HOME/.config/systemd/user
Create your
systemd
service file with the following example configurations:$ cat << EOF > $HOME/.config/systemd/user/ilab-serve.service [Unit] Description=ilab model serve service [Install] WantedBy=multi-user.target default.target 1 [Service] ExecStart=ilab model serve --model-family granite Restart=always EOF
- 1
- Specifies to start by default on boot.
Reload the
systemd
manager configuration by running the following command:$ systemctl --user daemon-reload
Start the
ilab model serve
systemd
service by running the following command:$ systemctl --user start ilab-serve.service
You can check that the service is running with the following command:
$ systemctl --user status ilab-serve.service
You can check the service logs by running the following command:
$ journalctl --user-unit ilab-serve.service
To allow the service to start on boot, run the following command:
$ sudo loginctl enable-linger
Optional: There are a few optional commands you can run for maintaining your
systemd
service.You can stop the ilab-serve system service by running the following command:
$ systemctl --user stop ilab-serve.service
-
You can prevent the service from starting on boot by removing the
"WantedBy=multi-user.target default.target"
from the$HOME/.config/systemd/user/ilab-serve.service
file.
4.2. Chatting with the model
Once you serve your model, you can now chat with the model.
The model you are chatting with must match the model you are serving. With the default config.yaml
file, the granite-7b-redhat-lab
model is the default for serving and chatting.
Prerequisites
- You installed RHEL AI with the bootable container image.
- You initialized InstructLab.
- You downloaded your preferred Granite LLMs.
- You are serving a model.
- You have root user access on your machine.
Procedure
- Since you are serving the model in one terminal window, you must open another terminal to chat with the model.
To chat with the default model, run the following command:
$ ilab model chat
To chat with a specific model run the following command:
$ ilab model chat --model <model-path>
Example command
$ ilab model chat --model ~/.cache/instructlab/models/granite-7b-code-instruct
Example output of the chatbot
$ ilab model chat ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────── system ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Welcome to InstructLab Chat w/ GRANITE-7B-CODE-INSTRUCT (type /h for help) │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ >>> [S][default]
+ Type exit
to leave the chatbot.
4.2.1. Optional: Creating an API key for model chatting
By default, the ilab
CLI does not use authentication. If you want to expose your server to the internet, you can create a API key that connects to your server with the following procedures.
Prerequisites
- You installed the Red Hat Enterprise Linux AI image on bare metal.
- You initialized InstructLab
- You downloaded your preferred Granite LLMs.
- You have root user access on your machine.
Procedure
Create a API key that is held in
$VLLM_API_KEY
parameter by running the following command:$ export VLLM_API_KEY=$(python -c 'import secrets; print(secrets.token_urlsafe())')
You can view the API key by running the following command:
$ echo $VLLM_API_KEY
Update the
config.yaml
by running the following command:$ ilab config edit
Add the following parameters to the
vllm_args
section of yourconfig.yaml
file.serve: vllm: vllm_args: - --api-key - <api-key-string>
where
- <api-key-string>
- Specify your API key string.
You can verify that the server is using API key authentication by running the following command:
$ ilab model chat
Then, seeing the following error that shows an unauthorized user.
openai.AuthenticationError: Error code: 401 - {'error': 'Unauthorized'}
Verify that your API key is working by running the following command:
$ ilab chat -m granite-7b-redhat-lab --endpoint-url https://inference.rhelai.com/v1 --api-key $VLLM_API_KEY
Example output
$ ilab model chat ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────── system ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Welcome to InstructLab Chat w/ GRANITE-7B-LAB (type /h for help) │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ >>> [S][default]
4.2.2. Optional: Allowing chat access to a model from a secure endpoint
You can serve an inference endpoint and allow others to interact with models provided with Red Hat Enterprise Linux AI on secure connections by creating a systemd
service and setting up a nginx reverse proxy that exposes a secure endpoint. This allows you to share the secure endpoint with others so they can chat with the model over a network.
The following procedure uses self-signed certifications, but it is recommended to use certificates issued by a trusted Certificate Authority (CA).
The following procedure is supported only on bare metal platforms.
Prerequisites
- You installed the Red Hat Enterprise Linux AI image on bare-metal.
- You initialized InstructLab
- You downloaded your preferred Granite LLMs.
- You have root user access on your machine.
Procedure
Create a directory for your certificate file and key by running the following command:
$ mkdir -p `pwd`/nginx/ssl/
Create an OpenSSL configuration file with the proper configurations by running the following command:
$ cat > openssl.cnf <<EOL [ req ] default_bits = 2048 distinguished_name = <req-distinguished-name> 1 x509_extensions = v3_req prompt = no [ req_distinguished_name ] C = US ST = California L = San Francisco O = My Company OU = My Division CN = rhelai.redhat.com [ v3_req ] subjectAltName = <alt-names> 2 basicConstraints = critical, CA:true subjectKeyIdentifier = hash authorityKeyIdentifier = keyid:always,issuer [ alt_names ] DNS.1 = rhelai.redhat.com 3 DNS.2 = www.rhelai.redhat.com 4
Generate a self signed certificate with a Subject Alternative Name (SAN) enabled with the following commands:
$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout `pwd`/nginx/ssl/rhelai.redhat.com.key -out `pwd`/nginx/ssl/rhelai.redhat.com.crt -config openssl.cnf
$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout
Create the Nginx Configuration file and add it to the
`pwd
/nginx/conf.d` by running the following command:mkdir -p `pwd`/nginx/conf.d echo 'server { listen 8443 ssl; server_name <rhelai.redhat.com> 1 ssl_certificate /etc/nginx/ssl/rhelai.redhat.com.crt; ssl_certificate_key /etc/nginx/ssl/rhelai.redhat.com.key; location / { proxy_pass http://127.0.0.1:8000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } } ' > `pwd`/nginx/conf.d/rhelai.redhat.com.conf
- 1
- Specify the name of your server. In the example, the server name is
rhelai.redhat.com
Run the Nginx container with the new configurations by running the following command:
$ podman run --net host -v `pwd`/nginx/conf.d:/etc/nginx/conf.d:ro,Z -v `pwd`/nginx/ssl:/etc/nginx/ssl:ro,Z nginx
If you want to use port 443, you must run the
podman run
command as a root user..You can now connect to a serving ilab machine using a secure endpoint URL. Example command:
$ ilab chat -m /instructlab/instructlab/granite-7b-redhat-lab --endpoint-url https://rhelai.redhat.com:8443/v1
Optional: You can also get the server certificate and append it to the Certifi CA Bundle
Get the server certificate by running the following command:
$ openssl s_client -connect rhelai.redhat.com:8443 </dev/null 2>/dev/null | openssl x509 -outform PEM > server.crt
Copy the certificate to you system’s trusted CA storage directory and update the CA trust store with the following commands:
$ sudo cp server.crt /etc/pki/ca-trust/source/anchors/
$ sudo update-ca-trust
You can append your certificate to the Certifi CA bundle by running the following command:
$ cat server.crt >> $(python -m certifi)
You can now run
ilab model chat
with a self-signed certificate. Example command:$ ilab chat -m /instructlab/instructlab/granite-7b-redhat-lab --endpoint-url https://rhelai.redhat.com:8443/v1