Chapter 4. Configuring Mistral 3 multimodal features


Configure Mistral 3 models to process image inputs alongside text for vision-language tasks such as image analysis and document understanding.

All Mistral 3 models include built-in vision encoders that process images at their native resolution and aspect ratio.

Prerequisites

  • You have deployed a Mistral 3 model with Red Hat AI Inference Server.

Procedure

  1. Start the inference server with multimodal input enabled:

    podman run --rm -it \
      --device nvidia.com/gpu=all \
      --shm-size=4g \
      -p 8000:8000 \
      --env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
      registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3.0 \
        --model RedHatAI/Mistral-Large-3-675B-Instruct-2512-NVFP4 \
        --tokenizer-mode mistral \
        --config-format mistral \
        --load-format mistral \
        --tensor-parallel-size 8 \
        --limit-mm-per-prompt '{"image":10}' \
        --host 0.0.0.0 \
        --port 8000
    Copy to Clipboard Toggle word wrap
    • --limit-mm-per-prompt '{"image":10}': sets the maximum number of images per prompt to 10. Adjust based on your use case and available memory.
    Note

    If you are using AI accelerators with less memory than NVIDIA H200, such as NVIDIA A100, you might need to lower the maximum context length to avoid out-of-memory errors. Add the --max-model-len argument to reduce the context length, for example --max-model-len 225000. Alternatively, you can adjust the --gpu-memory-utilization argument to control how much GPU memory is reserved for model weights and KV cache.

  2. Optional. To run in text-only mode with a multimodal model, disable image processing to free GPU memory:

    --limit-mm-per-prompt '{"image":0}'
    Copy to Clipboard Toggle word wrap

Verification

  1. Check that the model can process an image URL. For example, run the following command:

    $ curl http://localhost:8000/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "RedHatAI/Mistral-Large-3-675B-Instruct-2512-NVFP4",
        "messages": [
          {
            "role": "user",
            "content": [
              {"type": "text", "text": "What is shown in this image?"},
              {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/1280px-Cute_dog.jpg"}}
            ]
          }
        ],
        "max_tokens": 200
      }'
    Copy to Clipboard Toggle word wrap
  2. Alternatively, send an image as base64-encoded data:

    curl http://localhost:8000/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "RedHatAI/Mistral-Large-3-675B-Instruct-2512-NVFP4",
        "messages": [
          {
            "role": "user",
            "content": [
              {"type": "text", "text": "Describe this chart."},
              {"type": "image_url", "image_url": {"url": "data:image/png;base64,_<BASE64_ENCODED_IMAGE_DATA>_"}}
            ]
          }
        ],
        "max_tokens": 500
      }'
    Copy to Clipboard Toggle word wrap
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top