Chapter 5. Serving and chatting with the models

To interact with various models on Red Hat Enterprise Linux AI you must serve the model, which hosts it on a server, then you can chat with the models.

5.1. Serving the model
Copy link

To interact with the models, you must first activate the model in a machine through serving. The ilab model serve commands starts a vLLM server that allows you to chat with the model.

Prerequisites

You installed RHEL AI with the bootable container image.
You initialized InstructLab.
You installed your preferred Granite LLMs.
You have root user access on your machine.

Procedure

If you do not specify a model, you can serve the default model, granite-7b-redhat-lab, by running the following command:
```
$ ilab model serve
```

To serve a specific model, run the following command

$ ilab model serve --model-path <model-path>

Example command

$ ilab model serve --model-path ~/.cache/instructlab/models/granite-8b-code-instruct

Example output of when the model is served and ready

INFO 2024-03-02 02:21:11,352 lab.py:201 Using model 'models/granite-8b-code-instruct' with -1 gpu-layers and 4096 max context size.
Starting server process
After application startup complete see http://127.0.0.1:8000/docs for API.
Press CTRL+C to shut down the server.

5.1.1. Optional: Running ilab model serve as a service
Copy link

You can set up a systemd service so that the ilab model serve command runs as a running service. The systemd service runs the ilab model serve command in the background and restarts if it crashes or fails. You can configure the service to start upon system boot.

Prerequisites

You installed the Red Hat Enterprise Linux AI image on bare metal.
You initialized InstructLab
You downloaded your preferred Granite LLMs.
You have root user access on your machine.

Procedure.

Create a directory for your systemd user service by running the following command:
```
$ mkdir -p $HOME/.config/systemd/user
```

Create your systemd service file with the following example configurations:

$ cat << EOF > $HOME/.config/systemd/user/ilab-serve.service
[Unit]
Description=ilab model serve service

[Install]
WantedBy=multi-user.target default.target



[Service]
ExecStart=ilab model serve --model-family granite
Restart=always
EOF

1: Specifies to start by default on boot.

Reload the systemd manager configuration by running the following command:
```
$ systemctl --user daemon-reload
```
Start the ilab model serve systemd service by running the following command:
```
$ systemctl --user start ilab-serve.service
```
You can check that the service is running with the following command:
```
$ systemctl --user status ilab-serve.service
```
You can check the service logs by running the following command:
```
$ journalctl --user-unit ilab-serve.service
```
To allow the service to start on boot, run the following command:
```
$ sudo loginctl enable-linger
```
Optional: There are a few optional commands you can run for maintaining your systemd service.
- You can stop the ilab-serve system service by running the following command:
  $ systemctl --user stop ilab-serve.service
- You can prevent the service from starting on boot by removing the "WantedBy=multi-user.target default.target" from the $HOME/.config/systemd/user/ilab-serve.service file.

5.1.2. Optional: Allowing access to a model from a secure endpoint
Copy link

You can serve an inference endpoint and allow others to interact with models provided with Red Hat Enterprise Linux AI on secure connections by creating a systemd service and setting up a nginx reverse proxy that exposes a secure endpoint. This allows you to share the secure endpoint with others so they can chat with the model over a network.

The following procedure uses self-signed certifications, but it is recommended to use certificates issued by a trusted Certificate Authority (CA).

Note

The following procedure is supported only on bare metal platforms.

Prerequisites

You installed the Red Hat Enterprise Linux AI image on bare-metal.
You initialized InstructLab
You downloaded your preferred Granite LLMs.
You have root user access on your machine.

Procedure

Create a directory for your certificate file and key by running the following command:
```
$ mkdir -p `pwd`/nginx/ssl/
```

Create an OpenSSL configuration file with the proper configurations by running the following command:

$ cat > openssl.cnf <<EOL
[ req ]
default_bits = 2048
distinguished_name = <req-distinguished-name>


x509_extensions = v3_req
prompt = no

[ req_distinguished_name ]
C  = US
ST = California
L  = San Francisco
O  = My Company
OU = My Division
CN = rhelai.redhat.com

[ v3_req ]
subjectAltName = <alt-names>


basicConstraints = critical, CA:true
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid:always,issuer

[ alt_names ]
DNS.1 = rhelai.redhat.com


DNS.2 = www.rhelai.redhat.com

1: Specify the distinguished name for your requirements.
2: Specify the alternate name for your requirements.
3 4: Specify the server common name for RHEL AI. In the example, the server name is rhelai.redhat.com.

Generate a self signed certificate with a Subject Alternative Name (SAN) enabled with the following commands:

$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout `pwd`/nginx/ssl/rhelai.redhat.com.key -out `pwd`/nginx/ssl/rhelai.redhat.com.crt -config openssl.cnf

$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout

Create the Nginx Configuration file and add it to the `pwd/nginx/conf.d` by running the following command:

mkdir -p `pwd`/nginx/conf.d

echo 'server {
    listen 8443 ssl;
    server_name <rhelai.redhat.com>



    ssl_certificate /etc/nginx/ssl/rhelai.redhat.com.crt;
    ssl_certificate_key /etc/nginx/ssl/rhelai.redhat.com.key;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
' > `pwd`/nginx/conf.d/rhelai.redhat.com.conf

1: Specify the name of your server. In the example, the server name is rhelai.redhat.com

Run the Nginx container with the new configurations by running the following command:
```
$ podman run --net host -v `pwd`/nginx/conf.d:/etc/nginx/conf.d:ro,Z -v `pwd`/nginx/ssl:/etc/nginx/ssl:ro,Z nginx
```
If you want to use port 443, you must run the podman run command as a root user..
You can now connect to a serving ilab machine using a secure endpoint URL. Example command:
```
$ ilab model chat -m /instructlab/instructlab/granite-7b-redhat-lab --endpoint-url
```

You can also connect to the serving RHEL AI machine with the following command:

$ curl --location 'https://rhelai.redhat.com:8443/v1' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <api-key>' \
--data '{
  "model": "/var/home/cloud-user/.cache/instructlab/models/granite-7b-redhat-lab",
  "messages": [
      {
          "role": "system",
          "content": "You are a helpful assistant."
      },
      {
          "role": "user",
          "content": "Hello!"
      }
  ]
}' | jq .

where

<api-key>: Specify your API key. You can create your own API key by following the procedure in "Creating an API key for chatting with a model".

Optional: You can also get the server certificate and append it to the Certifi CA Bundle
1. Get the server certificate by running the following command:
  $ openssl s_client -connect rhelai.redhat.com:8443 </dev/null 2>/dev/null | openssl x509 -outform PEM > server.crt
2. Copy the certificate to you system’s trusted CA storage directory and update the CA trust store with the following commands:
  $ sudo cp server.crt /etc/pki/ca-trust/source/anchors/
  $ sudo update-ca-trust
3. You can append your certificate to the Certifi CA bundle by running the following command:
  $ cat server.crt >> $(python -m certifi)
4. You can now run ilab model chat with a self-signed certificate. Example command:
  $ ilab model chat -m /instructlab/instructlab/granite-7b-redhat-lab --endpoint-url https://rhelai.redhat.com:8443/v1

5.2. Chatting with the model
Copy link

Once you serve your model, you can now chat with the model.

Important

The model you are chatting with must match the model you are serving. With the default config.yaml file, the granite-7b-redhat-lab model is the default for serving and chatting.

Prerequisites

You installed RHEL AI with the bootable container image.
You initialized InstructLab.
You downloaded your preferred Granite LLMs.
You are serving a model.
You have root user access on your machine.

Procedure

Since you are serving the model in one terminal window, you must open another terminal to chat with the model.
To chat with the default model, run the following command:
```
$ ilab model chat
```

To chat with a specific model run the following command:

$ ilab model chat --model <model-path>

Example command

$ ilab model chat --model ~/.cache/instructlab/models/granite-8b-code-instruct

Example output of the chatbot

$ ilab model chat
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────── system ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Welcome to InstructLab Chat w/ GRANITE-8B-CODE-INSTRUCT (type /h for help)                                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>>                                                                                                                                                                                                                        [S][default]

+ Type exit to leave the chatbot.

5.2.1. Optional: Creating an API key for chatting with a model
Copy link

By default, the ilab CLI does not use authentication. If you want to expose your server to the internet, you can create a API key that connects to your server with the following procedures.

Prerequisites

You installed the Red Hat Enterprise Linux AI image on bare metal.
You initialized InstructLab
You downloaded your preferred Granite LLMs.
You have root user access on your machine.

Procedure

Create a API key that is held in $VLLM_API_KEY parameter by running the following command:

$ export VLLM_API_KEY=$(python -c 'import secrets; print(secrets.token_urlsafe())')

You can view the API key by running the following command:
```
$ echo $VLLM_API_KEY
```
Update the config.yaml by running the following command:
```
$ ilab config edit
```
Add the following parameters to the vllm_args section of your config.yaml file.
```
serve:
    vllm:
        vllm_args:
        - --api-key
        - <api-key-string>
```
where
<api-key-string>
Specify your API key string.
You can verify that the server is using API key authentication by running the following command:
```
$ ilab model chat
```
Then, seeing the following error that shows an unauthorized user.
```
openai.AuthenticationError: Error code: 401 - {'error': 'Unauthorized'}
```

Verify that your API key is working by running the following command:

$ ilab model chat -m granite-7b-redhat-lab --endpoint-url https://inference.rhelai.com/v1 --api-key $VLLM_API_KEY

Example output

$ ilab model chat
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────── system ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Welcome to InstructLab Chat w/ GRANITE-7B-LAB (type /h for help)                                                                                                                                                                    │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>>                                                                                                                                                                                                                        [S][default]

Chapter 5. Serving and chatting with the models

5.1. Serving the model
Copy link

5.1.1. Optional: Running ilab model serve as a service
Copy link

5.1.2. Optional: Allowing access to a model from a secure endpoint
Copy link

5.2. Chatting with the model
Copy link

5.2.1. Optional: Creating an API key for chatting with a model
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 5. Serving and chatting with the models

5.1. Serving the modelCopy linkLink copied to clipboard!

5.1.1. Optional: Running ilab model serve as a serviceCopy linkLink copied to clipboard!

5.1.2. Optional: Allowing access to a model from a secure endpointCopy linkLink copied to clipboard!

5.2. Chatting with the modelCopy linkLink copied to clipboard!

5.2.1. Optional: Creating an API key for chatting with a modelCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

5.1. Serving the model
Copy link

5.1.1. Optional: Running ilab model serve as a service
Copy link

5.1.2. Optional: Allowing access to a model from a secure endpoint
Copy link

5.2. Chatting with the model
Copy link

5.2.1. Optional: Creating an API key for chatting with a model
Copy link