1.11. About querying your AI model


Querying your model through the API is also called model inferencing. Model inferencing is most often used to retrieve information, automate tasks, make predictions, provide data insights, or perform actions.

In general, queries must be constructed using a format compatible with the AI model being used. A model-serving runtime formats queries automatically. The model processes the query according to the underlying training and data, then provides an output. The output is expected to align with the purpose of the model itself, whether that be to give an answer, make a prediction, or perform a task.

The following examples outline general steps to make sure your model is ready for inferencing, and what you might expect in a query output from the serving runtime.

1.11.1. Verifying that your AI model is accessible

Before querying the model through the API, you can check to be certain that the model is accessible and ready to provide answers based on the connected data. The following examples continue with the OpenVINO Model Server.

Prerequisites

  • You configured the AI model-serving runtime.
  • You uploaded your AI model to MicroShift.
  • MicroShift is running.
  • You installed OpenShift CLI (oc).

Procedure

  1. Get the IP address of the MicroShift node and assign it to the IP variable as the following example command shows:

    $ IP=$(oc get nodes -o json | jq -r '.items[0].status.addresses[0].address')
  2. Identify the name of the route you created by running the following command:

    $ oc get route -n ai-test <route_name> -o yaml 
    1
    1
    Replace <route_name> with the actual name of your route.
  3. Extract and assign the HOST value of the route to the DOMAIN variable by running the following command:

    DOMAIN=$(oc get route -n ai-test <route_name> -o=jsonpath="{ .status.ingress[0].host }") 
    1
    1
    Replace <route_name> with the actual name of your route.
  4. Enable data transfer from the route to the MicroShift IP address by running the following command:

    $ curl -i "${DOMAIN}/v2/models/ovms-resnet50/ready" --connect-to "${DOMAIN}::${IP}:" 
    1
    1
    Instead of using the --connect-to "${DOMAIN}::${IP}:" flag, you can also use real DNS, or add the IP address and the domain to the /etc/hosts file.

    Example output

    HTTP/1.1 200 OK
    content-type: application/json
    date: Wed, 12 Mar 2025 16:01:32 GMT
    content-length: 0
    set-cookie: 56bb4b6df4f80f0b59f56aa0a5a91c1a=4af1408b4a1c40925456f73033d4a7d1; path=/; HttpOnly

  5. Query the model metadata by running the following command:

    $ curl "${DOMAIN}/v2/models/ovms-resnet50" --connect-to "${DOMAIN}::${IP}:"

    Example output

    {"name":"ovms-resnet50","versions":["1"],"platform":"OpenVINO","inputs":[{"name":"0","datatype":"FP32","shape":[1,224,224,3]}],"outputs":[{"name":"1463","datatype":"FP32","shape":[1,1000]}]

Next steps

  • Verify that your model is ready for inferencing.
  • Query the model.
  • Verify the model response.
  • Optional: Get the model server metrics.

1.11.2. Getting your AI model ready for inference

Before querying your AI model through the API, get the model ready to provide answers based on the training data. The following examples continue with the OVMS model.

Prerequisites

  • MicroShift is running.
  • You have the xxd utility, which is part of the vim-common package.
  • You configured the model-serving runtime.
  • You uploaded your AI model to MicroShift.

Procedure

  1. Download an image of a bee from the OpenVINO Model Server examples by running the following command:

    $ curl -O https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/bee.jpeg
  2. Create the request data by running the following script:

    IMAGE=./bee.jpeg
    REQ=./request.json
    
    # Add an inference header
    echo -n '{"inputs" : [{"name": "0", "shape": [1], "datatype": "BYTES"}]}' > "${REQ}"
    # Get the size of the inference header 
    1
    
    HEADER_LEN="$(stat -c %s "${REQ}")"
    # Add size of the data (image) in binary format (4 bytes, little endian) 
    2
    
    printf "%08X" $(stat --format=%s "${IMAGE}") | sed 's/\(..\)/\1\n/g' | tac | tr -d '\n' | xxd -r -p >> "${REQ}"
    # Add the data, that is, append the image to the request file
    cat "${IMAGE}" >> "${REQ}"
    1
    The inference header size must be passed to OpenVINO Model Server later in the form of an HTTP header.
    2
    The OpenVINO Model Server requires 4 bytes in little endian byte order.

1.11.3. Querying your AI model

Make an inference request against the AI model server that is using the ovms-resnet50 model.

Prerequisites

  • MicroShift is running.
  • You configured the model-serving runtime.
  • You uploaded your AI model to MicroShift.

Procedure

  • Make an inference request against the model server that is using the ovms-resnet50 model by running the following command:

    $ curl \
        --data-binary "@./request.json" \
        --header "Inference-Header-Content-Length: ${HEADER_LEN}" \
        "${DOMAIN}/v2/models/ovms-resnet50/infer" \
        --connect-to "${DOMAIN}::${IP}:" > response.json

    Example inferencing output, saved to a response.json

    {
        "model_name": "ovms-resnet50",
        "model_version": "1",
        "outputs": [{
                "name": "1463",
                "shape": [1, 1000],
                "datatype": "FP32",
                "data": [ ....... ] 
    1
    
            }]
    }

    1
    The contents of .outputs[0].data were omitted from the example for brevity.

Verification

  1. To determine the model’s prediction, get the index of the highest element in the .outputs[0].data to determine the model’s predicted value by using the following Python script:

    import json
    with open('response.json') as f:
        response = json.load(f)
    data = response["outputs"][0]["data"]
    argmax = data.index(max(data))
    print(argmax)

    Example output

    309 
    1

    1
    In this example, the element labeled 309 is the model’s response.
  2. Validate the output against resnet’s input data, for example:

    ../../../../demos/common/static/images/bee.jpeg 309

Next steps

  • Optional. Query the AI model using other images available in the resnet input data.

1.11.4. Getting the model-server metrics

After making a query, you can get the model server’s metrics to identify bottlenecks, optimize resource allocation, and ensure efficient infrastructure utilization.

注記

You can alternatively configure Open Telemetry for MicroShift to get model-server metrics. See "Adding Open Telemetry to an InferenceService custom resource" for more information.

Prerequisites

  • MicroShift is running.
  • There have been enough queries to provide the metrics data you want to see.

Procedure

  • Get the Prometheus-format metrics of the model server by making a request on the /metrics endpoint by running the following command:

    $ curl "${DOMAIN}/metrics" --connect-to "${DOMAIN}::${IP}:"

    Partial example output

    # HELP ovms_requests_success Number of successful requests to a model or a DAG.
    # TYPE ovms_requests_success counter
    ovms_requests_success{api="KServe",interface="REST",method="ModelReady",name="ovms-resnet50"} 4
    ovms_requests_success{api="KServe",interface="REST",method="ModelMetadata",name="ovms-resnet50",version="1"} 1

Red Hat logoGithubredditYoutubeTwitter

詳細情報

試用、購入および販売

コミュニティー

会社概要

Red Hat は、企業がコアとなるデータセンターからネットワークエッジに至るまで、各種プラットフォームや環境全体で作業を簡素化できるように、強化されたソリューションを提供しています。

多様性を受け入れるオープンソースの強化

Red Hat では、コード、ドキュメント、Web プロパティーにおける配慮に欠ける用語の置き換えに取り組んでいます。このような変更は、段階的に実施される予定です。詳細情報: Red Hat ブログ.

Red Hat ドキュメントについて

Legal Notice

Theme

© 2026 Red Hat
トップに戻る