1.11. About querying your AI model
Querying your model through the API is also called model inferencing. Model inferencing is most often used to retrieve information, automate tasks, make predictions, provide data insights, or perform actions.
In general, queries must be constructed using a format compatible with the AI model being used. A model-serving runtime formats queries automatically. The model processes the query according to the underlying training and data, then provides an output. The output is expected to align with the purpose of the model itself, whether that be to give an answer, make a prediction, or perform a task.
The following examples outline general steps to make sure your model is ready for inferencing, and what you might expect in a query output from the serving runtime.
1.11.1. Verifying that your AI model is accessible リンクのコピーリンクがクリップボードにコピーされました!
Before querying the model through the API, you can check to be certain that the model is accessible and ready to provide answers based on the connected data. The following examples continue with the OpenVINO Model Server.
Prerequisites
- You configured the AI model-serving runtime.
- You uploaded your AI model to MicroShift.
- MicroShift is running.
-
You installed OpenShift CLI (
oc).
Procedure
Get the IP address of the MicroShift node and assign it to the
IPvariable as the following example command shows:$ IP=$(oc get nodes -o json | jq -r '.items[0].status.addresses[0].address')Identify the name of the route you created by running the following command:
$ oc get route -n ai-test <route_name> -o yaml1 - 1
- Replace
<route_name>with the actual name of your route.
Extract and assign the
HOSTvalue of the route to theDOMAINvariable by running the following command:DOMAIN=$(oc get route -n ai-test <route_name> -o=jsonpath="{ .status.ingress[0].host }")1 - 1
- Replace
<route_name>with the actual name of your route.
Enable data transfer from the route to the MicroShift IP address by running the following command:
$ curl -i "${DOMAIN}/v2/models/ovms-resnet50/ready" --connect-to "${DOMAIN}::${IP}:"1 - 1
- Instead of using the
--connect-to "${DOMAIN}::${IP}:"flag, you can also use real DNS, or add the IP address and the domain to the/etc/hostsfile.
Example output
HTTP/1.1 200 OK content-type: application/json date: Wed, 12 Mar 2025 16:01:32 GMT content-length: 0 set-cookie: 56bb4b6df4f80f0b59f56aa0a5a91c1a=4af1408b4a1c40925456f73033d4a7d1; path=/; HttpOnlyQuery the model metadata by running the following command:
$ curl "${DOMAIN}/v2/models/ovms-resnet50" --connect-to "${DOMAIN}::${IP}:"Example output
{"name":"ovms-resnet50","versions":["1"],"platform":"OpenVINO","inputs":[{"name":"0","datatype":"FP32","shape":[1,224,224,3]}],"outputs":[{"name":"1463","datatype":"FP32","shape":[1,1000]}]
Next steps
- Verify that your model is ready for inferencing.
- Query the model.
- Verify the model response.
- Optional: Get the model server metrics.
1.11.2. Getting your AI model ready for inference リンクのコピーリンクがクリップボードにコピーされました!
Before querying your AI model through the API, get the model ready to provide answers based on the training data. The following examples continue with the OVMS model.
Prerequisites
- MicroShift is running.
-
You have the
xxdutility, which is part of thevim-commonpackage. - You configured the model-serving runtime.
- You uploaded your AI model to MicroShift.
Procedure
Download an image of a bee from the OpenVINO Model Server examples by running the following command:
$ curl -O https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/bee.jpegCreate the request data by running the following script:
IMAGE=./bee.jpeg REQ=./request.json # Add an inference header echo -n '{"inputs" : [{"name": "0", "shape": [1], "datatype": "BYTES"}]}' > "${REQ}" # Get the size of the inference header1 HEADER_LEN="$(stat -c %s "${REQ}")" # Add size of the data (image) in binary format (4 bytes, little endian)2 printf "%08X" $(stat --format=%s "${IMAGE}") | sed 's/\(..\)/\1\n/g' | tac | tr -d '\n' | xxd -r -p >> "${REQ}" # Add the data, that is, append the image to the request file cat "${IMAGE}" >> "${REQ}"
1.11.3. Querying your AI model リンクのコピーリンクがクリップボードにコピーされました!
Make an inference request against the AI model server that is using the ovms-resnet50 model.
Prerequisites
- MicroShift is running.
- You configured the model-serving runtime.
- You uploaded your AI model to MicroShift.
Procedure
Make an inference request against the model server that is using the
ovms-resnet50model by running the following command:$ curl \ --data-binary "@./request.json" \ --header "Inference-Header-Content-Length: ${HEADER_LEN}" \ "${DOMAIN}/v2/models/ovms-resnet50/infer" \ --connect-to "${DOMAIN}::${IP}:" > response.jsonExample inferencing output, saved to a
response.json{ "model_name": "ovms-resnet50", "model_version": "1", "outputs": [{ "name": "1463", "shape": [1, 1000], "datatype": "FP32", "data": [ ....... ]1 }] }- 1
- The contents of
.outputs[0].datawere omitted from the example for brevity.
Verification
To determine the model’s prediction, get the index of the highest element in the
.outputs[0].datato determine the model’s predicted value by using the following Python script:import json with open('response.json') as f: response = json.load(f) data = response["outputs"][0]["data"] argmax = data.index(max(data)) print(argmax)Example output
3091 - 1
- In this example, the element labeled
309is the model’s response.
Validate the output against resnet’s input data, for example:
../../../../demos/common/static/images/bee.jpeg 309
Next steps
- Optional. Query the AI model using other images available in the resnet input data.
1.11.4. Getting the model-server metrics リンクのコピーリンクがクリップボードにコピーされました!
After making a query, you can get the model server’s metrics to identify bottlenecks, optimize resource allocation, and ensure efficient infrastructure utilization.
You can alternatively configure Open Telemetry for MicroShift to get model-server metrics. See "Adding Open Telemetry to an InferenceService custom resource" for more information.
Prerequisites
- MicroShift is running.
- There have been enough queries to provide the metrics data you want to see.
Procedure
Get the Prometheus-format metrics of the model server by making a request on the
/metricsendpoint by running the following command:$ curl "${DOMAIN}/metrics" --connect-to "${DOMAIN}::${IP}:"Partial example output
# HELP ovms_requests_success Number of successful requests to a model or a DAG. # TYPE ovms_requests_success counter ovms_requests_success{api="KServe",interface="REST",method="ModelReady",name="ovms-resnet50"} 4 ovms_requests_success{api="KServe",interface="REST",method="ModelMetadata",name="ovms-resnet50",version="1"} 1