Questo contenuto non è disponibile nella lingua selezionata.
Chapter 3. Converting models with Speculators
Convert an existing Eagle 3 speculator model to the Speculators format for use with Red Hat AI Inference Server. Use this procedure when you have an externally-trained Eagle 3 checkpoint that is not already in the Speculators format.
Prerequisites
- You have installed Podman or Docker.
- You are logged in as a user with sudo access.
-
You have access to the
registry.redhat.ioimage registry and have logged in. - You have a Hugging Face account and have generated a Hugging Face access token.
- You have access to a Linux server with at least one NVIDIA AI accelerator installed.
- You have installed the relevant NVIDIA drivers.
- You have installed the NVIDIA Container Toolkit.
This example uses the meta-llama/Meta-Llama-3.1-8B-Instruct model, which requires accepting a license agreement. Before running this procedure, request access at meta-llama/Llama-3.1-8B-Instruct on Hugging Face.
Procedure
Pull the Red Hat AI Model Optimization Toolkit container image:
$ podman pull registry.redhat.io/rhaii-early-access/model-opt-cuda-rhel9:3.4.0-ea.2Verify the Speculators version installed in the container:
$ podman run --rm -it \ registry.redhat.io/rhaii-early-access/model-opt-cuda-rhel9:3.4.0-ea.2 \ pip show speculators | grep VersionExample output
Version: 0.4.0a1Create a working directory and clone the upstream Speculators repository:
$ mkdir model-opt && \ cd model-opt && \ git clone https://github.com/vllm-project/speculators.gitCheck out the Speculators branch that matches the version installed in the container:
$ cd speculators && \ git checkout v0.4.0+rhaiisCreate or append your
HF_TOKENHugging Face token to theprivate.envfile and source it:$ echo "export HF_TOKEN=<YOUR_HF_TOKEN>" > private.env $ source private.envIf your system has SELinux enabled, configure SELinux to allow device access:
$ sudo setsebool -P container_use_devices 1Run the apply_eagle3_eagle.sh convert example using the Red Hat AI Model Optimization Toolkit container:
$ podman run --rm \ -v "$(pwd):/opt/app-root/model-opt" \ --device nvidia.com/gpu=0 \ --ipc=host \ -e HF_TOKEN=$HF_TOKEN \ registry.redhat.io/rhaii-early-access/model-opt-cuda-rhel9:3.4.0-ea.2 \ bash /opt/app-root/model-opt/speculators/examples/convert/eagle3/apply_eagle3_eagle.shThe script downloads the Eagle 3 checkpoint, converts it to the Speculators format, and validates the result.
Verification
-
Verify that the output includes
Validation succeeded. -
Confirm that the converted model directory exists in your working directory, for example
eagle3-llama-3.1-8b-instruct-converted.
Example output
2026-04-17 13:58:49.830 | INFO | speculators.convert.eagle.eagle3_converter:convert:41 - Converting Eagle-3 checkpoint: yuhuili/EAGLE3-LLaMA3.1-Instruct-8B
Fetching 2 files: 100%|██████████| 2/2 [00:06<00:00, 3.04s/it]
2026-04-17 13:59:01.127 | SUCCESS | speculators.convert.eagle.eagle3_converter:convert:88 - Saved to: eagle3-llama-3.1-8b-instruct-converted
2026-04-17 13:59:03.888 | SUCCESS | speculators.convert.eagle.eagle3_converter:_validate_converted_checkpoint:220 - Validation succeeded