Chapter 3. Converting models with Speculators


Convert an existing Eagle 3 speculator model to the Speculators format for use with Red Hat AI Inference Server. Use this procedure when you have an externally-trained Eagle 3 checkpoint that is not already in the Speculators format.

Prerequisites

  • You have installed Podman or Docker.
  • You are logged in as a user with sudo access.
  • You have access to the registry.redhat.io image registry and have logged in.
  • You have a Hugging Face account and have generated a Hugging Face access token.
  • You have access to a Linux server with at least one NVIDIA AI accelerator installed.
  • You have installed the relevant NVIDIA drivers.
  • You have installed the NVIDIA Container Toolkit.
Note

This example uses the meta-llama/Meta-Llama-3.1-8B-Instruct model, which requires accepting a license agreement. Before running this procedure, request access at meta-llama/Llama-3.1-8B-Instruct on Hugging Face.

Procedure

  1. Pull the Red Hat AI Model Optimization Toolkit container image:

    $ podman pull registry.redhat.io/rhaii-early-access/model-opt-cuda-rhel9:3.4.0-ea.2
  2. Verify the Speculators version installed in the container:

    $ podman run --rm -it \
      registry.redhat.io/rhaii-early-access/model-opt-cuda-rhel9:3.4.0-ea.2 \
      pip show speculators | grep Version

    Example output

    Version: 0.4.0a1

  3. Create a working directory and clone the upstream Speculators repository:

    $ mkdir model-opt && \
    cd model-opt && \
    git clone https://github.com/vllm-project/speculators.git
  4. Check out the Speculators branch that matches the version installed in the container:

    $ cd speculators && \
    git checkout v0.4.0+rhaiis
  5. Create or append your HF_TOKEN Hugging Face token to the private.env file and source it:

    $ echo "export HF_TOKEN=<YOUR_HF_TOKEN>" > private.env
    $ source private.env
  6. If your system has SELinux enabled, configure SELinux to allow device access:

    $ sudo setsebool -P container_use_devices 1
  7. Run the apply_eagle3_eagle.sh convert example using the Red Hat AI Model Optimization Toolkit container:

    $ podman run --rm \
      -v "$(pwd):/opt/app-root/model-opt" \
      --device nvidia.com/gpu=0 \
      --ipc=host \
      -e HF_TOKEN=$HF_TOKEN \
      registry.redhat.io/rhaii-early-access/model-opt-cuda-rhel9:3.4.0-ea.2 \
      bash /opt/app-root/model-opt/speculators/examples/convert/eagle3/apply_eagle3_eagle.sh

    The script downloads the Eagle 3 checkpoint, converts it to the Speculators format, and validates the result.

Verification

  • Verify that the output includes Validation succeeded.
  • Confirm that the converted model directory exists in your working directory, for example eagle3-llama-3.1-8b-instruct-converted.

Example output

2026-04-17 13:58:49.830 | INFO     | speculators.convert.eagle.eagle3_converter:convert:41 - Converting Eagle-3 checkpoint: yuhuili/EAGLE3-LLaMA3.1-Instruct-8B
Fetching 2 files: 100%|██████████| 2/2 [00:06<00:00,  3.04s/it]
2026-04-17 13:59:01.127 | SUCCESS  | speculators.convert.eagle.eagle3_converter:convert:88 - Saved to: eagle3-llama-3.1-8b-instruct-converted
2026-04-17 13:59:03.888 | SUCCESS  | speculators.convert.eagle.eagle3_converter:_validate_converted_checkpoint:220 - Validation succeeded

Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top