Inference serving geospatial foundation models
Inference serving geospatial foundation models with Red Hat AI Inference Server
Abstract
Preface Copy linkLink copied to clipboard!
Serve IBM and NASA Prithvi geospatial foundation models using Red Hat AI Inference Server and TerraTorch for satellite imagery analysis.
Chapter 1. About geospatial inference Copy linkLink copied to clipboard!
Geospatial models use the Vision Transformer (ViT) architecture to analyze satellite imagery and remote sensing data for applications such as environmental monitoring, land use classification, and climate analysis. Prithvi models are developed in collaboration with IBM and NASA.
IBM and NASA Prithvi geospatial foundation models are pre-trained on large datasets of satellite and aerial imagery. These models are trained on general representation of Earth observation data that can be fine-tuned for specific tasks.
Prithvi geospatial foundation models use a Vision Transformer (ViT) architecture that adapts the transformer model, originally designed for natural language processing, to process image data. ViT divides images into fixed-size patches, which are then processed as sequences similar to tokens in text.
For geospatial applications, ViT models can process multi-spectral satellite imagery with multiple input bands, enabling analysis beyond standard RGB imagery.
You can fine-tune geospatial foundation models using TerraTorch, an open-source library for fine-tuning and inference of geospatial foundation models.
You can find out more about the Prithvi models at huggingface.co/ibm-nasa-geospatial.
Chapter 2. Serving geospatial models with AI Inference Server Copy linkLink copied to clipboard!
Serve IBM and NASA Prithvi geospatial foundation models using AI Inference Server and TerraTorch for satellite imagery analysis.
Prerequisites
- You have installed Podman or Docker.
- You are logged in as a user with sudo access.
-
You have access to
registry.redhat.ioand have logged in. - You have a Hugging Face account and have generated a Hugging Face access token.
You have access to a Linux server with data center grade NVIDIA AI accelerators installed.
For NVIDIA GPUs:
- Install NVIDIA drivers
- Install the NVIDIA Container Toolkit
- If your system has multiple NVIDIA GPUs that use NVSwitch, you must have root access to start Fabric Manager
- You have satellite imagery data in a supported format such as GeoTIFF.
Procedure
Open a terminal on your server host, and log in to
registry.redhat.io:podman login registry.redhat.io
$ podman login registry.redhat.ioCopy to Clipboard Copied! Toggle word wrap Toggle overflow Pull the AI Inference Server NVIDIA CUDA container image:
podman pull registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3.0
$ podman pull registry.redhat.io/rhaiis/vllm-cuda-rhel9:3.3.0Copy to Clipboard Copied! Toggle word wrap Toggle overflow If your system has SELinux enabled, configure SELinux to allow device access:
sudo setsebool -P container_use_devices 1
$ sudo setsebool -P container_use_devices 1Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a volume and mount it into the container. Adjust the container permissions so that the container can use it.
mkdir -p rhaiis-cache
$ mkdir -p rhaiis-cacheCopy to Clipboard Copied! Toggle word wrap Toggle overflow chmod g+rwX rhaiis-cache
$ chmod g+rwX rhaiis-cacheCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add your
HF_TOKENHugging Face token to theprivate.envfile. Source theprivate.envfile.echo "export HF_TOKEN=<your_HF_token>" > private.env
$ echo "export HF_TOKEN=<your_HF_token>" > private.envCopy to Clipboard Copied! Toggle word wrap Toggle overflow source private.env
$ source private.envCopy to Clipboard Copied! Toggle word wrap Toggle overflow Start the AI Inference Server container image.
For NVIDIA CUDA accelerators, if the host system has multiple GPUs and uses NVSwitch, then start NVIDIA Fabric Manager. To detect if your system is using NVSwitch, first check if files are present in
/proc/driver/nvidia-nvswitch/devices/, and then start NVIDIA Fabric Manager. Starting NVIDIA Fabric Manager requires root privileges.ls /proc/driver/nvidia-nvswitch/devices/
$ ls /proc/driver/nvidia-nvswitch/devices/Copy to Clipboard Copied! Toggle word wrap Toggle overflow PCI device addresses for each available AI accelerator are returned.
systemctl start nvidia-fabricmanager
$ systemctl start nvidia-fabricmanagerCopy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantNVIDIA Fabric Manager is only required on systems with multiple GPUs that use NVSwitch. For more information, see NVIDIA Server Architectures.
Check that the AI Inference Server container can access NVIDIA GPUs on the host by running the following command:
podman run --rm -it \ --security-opt=label=disable \ --device nvidia.com/gpu=all \ nvcr.io/nvidia/cuda:12.4.1-base-ubi9 \ nvidia-smi
$ podman run --rm -it \ --security-opt=label=disable \ --device nvidia.com/gpu=all \ nvcr.io/nvidia/cuda:12.4.1-base-ubi9 \ nvidia-smiCopy to Clipboard Copied! Toggle word wrap Toggle overflow All available AI accelerators are returned.
Start the container with the TerraTorch backend and the Prithvi geospatial model.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For detailed information about TerraTorch server arguments and configuration options, see TerraTorch configuration options.
In a separate tab in your terminal, send an inference request with your geospatial data.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The model returns a JSON response containing base64-encoded prediction data. Decode the
data.datafield to retrieve the output GeoTIFF file containing segmentation results.
Chapter 3. TerraTorch configuration options for geospatial model serving Copy linkLink copied to clipboard!
Use the Red Hat AI Inference Server server arguments when starting AI Inference Server with the TerraTorch backend for geospatial model serving.
| Argument | Description |
|---|---|
|
| Skips tokenizer initialization. Vision models do not require a tokenizer. |
|
| Disables CUDA graph optimization for compatibility with geospatial model architectures. |
|
| Specifies the I/O processor plugin for segmentation tasks. |
|
| Enables multimodal embeddings for processing geospatial imagery. |
Geospatial model serving with TerraTorch exposes the /pooling POST API endpoint for geospatial imagery inference requests.
Example request payload