Chapter 3. Converting models with Speculators

Convert an existing Eagle 3 speculator model to the Speculators format for use with Red Hat AI Inference Server. Use this procedure when you have an externally-trained Eagle 3 checkpoint that is not already in the Speculators format.

Prerequisites

You have installed Podman or Docker.
You are logged in as a user with sudo access.
You have access to the registry.redhat.io image registry and have logged in.
You have a Hugging Face account and have generated a Hugging Face access token.
You have access to a Linux server with at least one NVIDIA AI accelerator installed.
You have installed the relevant NVIDIA drivers.
You have installed the NVIDIA Container Toolkit.

Note

This example uses the meta-llama/Meta-Llama-3.1-8B-Instruct model, which requires accepting a license agreement. Before running this procedure, request access at meta-llama/Llama-3.1-8B-Instruct on Hugging Face.

Procedure

Pull the Red Hat AI Model Optimization Toolkit container image:

$ podman pull registry.redhat.io/rhaii-early-access/model-opt-cuda-rhel9:3.4.0-ea.2

Verify the Speculators version installed in the container:

$ podman run --rm -it \
  registry.redhat.io/rhaii-early-access/model-opt-cuda-rhel9:3.4.0-ea.2 \
  pip show speculators | grep Version

Example output

Version: 0.4.0a1

Create a working directory and clone the upstream Speculators repository:

$ mkdir model-opt && \
cd model-opt && \
git clone https://github.com/vllm-project/speculators.git

Check out the Speculators branch that matches the version installed in the container:
```
$ cd speculators && \
git checkout v0.4.0+rhaiis
```
Create or append your HF_TOKEN Hugging Face token to the private.env file and source it:
```
$ echo "export HF_TOKEN=<YOUR_HF_TOKEN>" > private.env
$ source private.env
```
If your system has SELinux enabled, configure SELinux to allow device access:
```
$ sudo setsebool -P container_use_devices 1
```

Run the apply_eagle3_eagle.sh convert example using the Red Hat AI Model Optimization Toolkit container:

$ podman run --rm \
  -v "$(pwd):/opt/app-root/model-opt" \
  --device nvidia.com/gpu=0 \
  --ipc=host \
  -e HF_TOKEN=$HF_TOKEN \
  registry.redhat.io/rhaii-early-access/model-opt-cuda-rhel9:3.4.0-ea.2 \
  bash /opt/app-root/model-opt/speculators/examples/convert/eagle3/apply_eagle3_eagle.sh

The script downloads the Eagle 3 checkpoint, converts it to the Speculators format, and validates the result.

Verification

Verify that the output includes Validation succeeded.
Confirm that the converted model directory exists in your working directory, for example eagle3-llama-3.1-8b-instruct-converted.

Example output

2026-04-17 13:58:49.830 | INFO     | speculators.convert.eagle.eagle3_converter:convert:41 - Converting Eagle-3 checkpoint: yuhuili/EAGLE3-LLaMA3.1-Instruct-8B
Fetching 2 files: 100%|██████████| 2/2 [00:06<00:00,  3.04s/it]
2026-04-17 13:59:01.127 | SUCCESS  | speculators.convert.eagle.eagle3_converter:convert:88 - Saved to: eagle3-llama-3.1-8b-instruct-converted
2026-04-17 13:59:03.888 | SUCCESS  | speculators.convert.eagle.eagle3_converter:_validate_converted_checkpoint:220 - Validation succeeded

Chapter 3. Converting models with Speculators

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links