Chapter 2. Configuring TrustyAI
To configure model monitoring with TrustyAI for data scientists to use in OpenShift AI, a cluster administrator does the following tasks:
- Configure monitoring for the model serving platform
- Enable the TrustyAI component in the Red Hat OpenShift AI Operator
- Configure TrustyAI to use a database, if you want to use your database instead of a PVC for storage with TrustyAI
- Install the TrustyAI service on each data science project that contains models that the data scientists want to monitor
2.1. Configuring monitoring for your model serving platform Copy linkLink copied to clipboard!
OpenShift AI provides the following model serving platforms:
- Single-model serving platform
- For deploying large models such as large language models (LLMs), OpenShift AI includes a single model serving platform that is based on the KServe component. Each model is deployed from its own model server. Use the single model serving platform in situations where you need to deploy, monitor, scale, and maintain large models that require increased resources.
- Multi-model serving platform
- For deploying small and medium-sized models, OpenShift AI includes a multi-model serving platform that is based on the ModelMesh component. On the multi-model serving platform, you can deploy multiple models on the same model server. Each of the deployed models shares the server resources. This approach can be useful on OpenShift AI clusters that have finite compute resources or pods.
The process for configuring monitoring for either a single (KServe) or a multi-model (ModelMesh) serving platform is the same.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
- You are familiar with creating a config map for monitoring a user-defined workflow. You will perform similar steps in this procedure.
- You are familiar with enabling monitoring for user-defined projects in OpenShift. You will perform similar steps in this procedure.
-
You have assigned the
monitoring-rules-viewrole to users that will monitor metrics.
Procedure
In a terminal window, if you are not already logged in to your OpenShift cluster as a cluster administrator, log in to the OpenShift CLI as shown in the following example:
$ oc login <openshift_cluster_url> -u <admin_username> -p <password>Define a
ConfigMapobject in a YAML file calleduwm-cm-conf.yamlwith the following contents:apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: logLevel: debug retention: 15dThe
user-workload-monitoring-configobject configures the components that monitor user-defined projects. Observe that the retention time is set to the recommended value of 15 days.Apply the configuration to create the
user-workload-monitoring-configobject.$ oc apply -f uwm-cm-conf.yamlDefine another
ConfigMapobject in a YAML file calleduwm-cm-enable.yamlwith the following contents:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | enableUserWorkload: trueThe
cluster-monitoring-configobject enables monitoring for user-defined projects.Apply the configuration to create the
cluster-monitoring-configobject.$ oc apply -f uwm-cm-enable.yaml
2.2. Enabling the TrustyAI component Copy linkLink copied to clipboard!
To allow your data scientists to use model monitoring with TrustyAI, you must enable the TrustyAI component in OpenShift AI.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You have access to the data science cluster.
- You have installed Red Hat OpenShift AI.
Procedure
-
In the OpenShift console, click Operators
Installed Operators. - Search for the Red Hat OpenShift AI Operator, and then click the Operator name to open the Operator details page.
- Click the Data Science Cluster tab.
- Click the default instance name (for example, default-dsc) to open the instance details page.
- Click the YAML tab to show the instance specifications.
In the
spec:componentssection, set themanagementStatefield for thetrustyaicomponent toManaged:trustyai: managementState: Managed- Click Save.
Verification
Check the status of the trustyai-service-operator pod:
- In the OpenShift console, from the Project list, select redhat-ods-applications.
-
Click Workloads
Deployments. Search for the trustyai-service-operator-contoller-manager deployment. Check the status:
- Click the deployment name to open the deployment details page.
- Click the Pods tab.
View the pod status.
When the status of the trustyai-service-operator-controller-manager-<pod-id> pod is Running, the pod is ready to use.
2.3. Configuring TrustyAI with a database Copy linkLink copied to clipboard!
If you have a relational database in your OpenShift cluster such as MySQL or MariaDB, you can configure TrustyAI to use your database instead of a persistent volume claim (PVC). Using a database instead of a PVC for storage can improve scalability, performance, and data management in TrustyAI. Provide TrustyAI with a database configuration secret before deployment. You can create a secret or specify the name of an existing Kubernetes secret within your project.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
- You have enabled the TrustyAI component, as described in Enabling the TrustyAI component.
- The data scientist has created a data science project, as described in Creating a data science project, that contains the models that the data scientist wants to monitor.
- If you are configuring the TrustyAI service with an external MySQL database, your database must already be in your cluster and use at least MySQL version 5.x. However, Red Hat recommends that you use MySQL version 8.x.
- If you are configuring the TrustyAI service with a MariaDB database, your database must already be in your cluster and use MariaDB version 10.3 or later. However, Red Hat recommends that you use at least MariaDB version 10.5.
The transport security layer (TLS) protocol does not work with the MariaDB operator 0.29 or later versions.
Procedure
In a terminal window, if you are not already logged in to your OpenShift cluster as a cluster administrator, log in to the OpenShift CLI as shown in the following example:
$ oc login <openshift_cluster_url> -u <admin_username> -p <password>Optional: If you want to use a TLS connection between TrustyAI and the database, create a TrustyAI service database TLS secret that uses the same certificates that you want to use for the database.
Create a YAML file to contain your TLS secret and add the following code:
apiVersion: v1 kind: Secret metadata: name: <service_name>-db-tls type: kubernetes.io/tls data: tls.crt: | <TLS CERTIFICATE> tls.key: | <TLS KEY>-
Save the file with the file name <service_name>-db-tls.yaml. For example, if your service name is
trustyai-service, save the file as trustyai-service-db-tls.yaml. Apply the YAML file in the data science project that contains the models that the data scientist wants to monitor:
$ oc apply -f <service_name>-db-tls.yaml -n <project_name>
Create a secret (or specify an existing one) that has your database credentials.
Create a YAML file to contain your secret and add the following code:
apiVersion: v1 kind: Secret metadata: name: db-credentials type: Opaque stringData: databaseKind: <mariadb>1 databaseUsername: <TrustyAI_username>2 databasePassword: <TrustyAI_password>3 databaseService: mariadb-service4 databasePort: 33065 databaseGeneration: update6 databaseName: trustyai_service7 - 1
- The only currently supported
databaseKindvalue ismariadb. - 2
- The username you want TrustyAI to use when interfacing with the database.
- 3
- The password that TrustyAI must use when connecting to the database.
- 4
- The Kubernetes (K8s) service that TrustyAI must use when connecting to the database (the default
mariadb) . - 5
- The port that TrustyAI must use when connecting to the database (default is 3306).
- 6
- The database schema generation strategy to be used by TrustyAI. It is the setting for the
quarkus.hibernate-orm.database.generationargument, which determines how TrustyAI interacts with the database on its initial connection. Set tonone,create,drop-and-create,drop,update, orvalidate. - 7
- The name of the individual database within the database service that the username and password authenticate to, as well as the specific database name that TrustyAI should read and write to on the database server.
- Save the file with the file name db-credentials.yaml. You will need this name later when you install or change the TrustyAI service.
Apply the YAML file in the data science project that contains the models that the data scientist wants to monitor:
$ oc apply -f db-credentials.yaml -n <project_name>
If you are installing TrustyAI for the first time on a project, continue to Installing the TrustyAI service for a project.
If you already installed TrustyAI on a project, you can migrate the existing TrustyAI service from using a PVC to using a database.
Create a YAML file to update the TrustyAI service custom resource (CR) and add the following code:
apiVersion: trustyai.opendatahub.io/v1alpha1 kind: TrustyAIService metadata: annotations: trustyai.opendatahub.io/db-migration: "true"1 name: trustyai-service2 spec: storage: format: "DATABASE"3 folder: "/inputs"4 size: "1Gi"5 databaseConfigurations: <database_secret_credentials>6 data: filename: "data.csv"7 metrics: schedule: "5s"8 - 1
- Set to
trueto prompt the migration from PVC to database storage. - 2
- The name of the TrustyAI service instance.
- 3
- The storage format for the data. Set this field to
DATABASE. - 4
- The location within the PVC where you were storing the data. This must match the value specified in the existing CR.
- 5
- The size of the data to request.
- 6
- The name of the secret with your database credentials that you created in an earlier step. For example,
db-credentials. - 7
- The suffix for the existing stored data files. This must match the value specified in the existing CR.
- 8
- The interval at which to calculate the metrics. The default is
5s. The duration is specified with the ISO-8601 format. For example,5sfor 5 seconds,5mfor 5 minutes, and5hfor 5 hours.
- Save the file. For example, trustyai_crd.yaml.
Apply the new TrustyAI service CR to the data science project that contains the models that the data scientist wants to monitor:
$ oc apply -f trustyai_crd.yaml -n <project_name>
2.4. Installing the TrustyAI service for a project Copy linkLink copied to clipboard!
Install the TrustyAI service on a data science project to provide access to its features for all models deployed within that project. An instance of the TrustyAI service is required for each data science project, or namespace, that contains models that the data scientists want to monitor.
Install only one instance of the TrustyAI service in a project. Multiple instances in the same project can result in unexpected behavior.
Installing TrustyAI into a namespace where non-OVMS models are deployed can cause errors in the TrustyAI service.
2.4.1. Installing the TrustyAI service by using the CLI Copy linkLink copied to clipboard!
You can use the OpenShift command-line interface (CLI) to install an instance of the TrustyAI service.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You have downloaded and installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
- You have configured monitoring for the model serving platform, as described in Configuring monitoring for the multi-model serving platform.
- You have enabled the TrustyAI component, as described in Enabling the TrustyAI component.
- If you are using TrustyAI with a database instead of PVC, you have configured TrustyAI to use the database, as described in Configuring TrustyAI with a database.
- The data scientist has created a data science project, as described in Creating a data science project, that contains the models that the data scientist wants to monitor.
Procedure
- Open a new terminal window.
Follow these steps to log in to your OpenShift cluster as a cluster administrator:
- In the OpenShift web console, click your user name and select Copy login command.
- After you have logged in, click Display token.
Copy the Log in with this token command and paste it in the OpenShift command-line interface (CLI).
$ oc login --token=<token> --server=<openshift_cluster_url>
Navigate to the data science project that contains the models that the data scientist wants to monitor.
oc project <project_name>For example:
oc project my-projectCreate a
TrustyAIServicecustom resource (CR) file, for exampletrustyai_crd.yaml:Example CR file for TrustyAI using a database
apiVersion: trustyai.opendatahub.io/v1alpha1 kind: TrustyAIService metadata: name: trustyai-service1 spec: storage: format: "DATABASE"2 size: "1Gi"3 databaseConfigurations: <database_secret_credentials>4 metrics: schedule: "5s"5 - 1
- The name of the TrustyAI service instance.
- 2
- The storage format for the data, either
DATABASEorPVC(persistent volume claim). Red Hat recommends that you use a database setup for better scalability, performance, and data management in TrustyAI. - 3
- The size of the data to request.
- 4
- The name of the secret with your database credentials that you created in Configuring TrustyAI with a database. For example,
db-credentials. - 5
- The interval at which to calculate the metrics. The default is
5s. The duration is specified with the ISO-8601 format. For example,5sfor 5 seconds,5mfor 5 minutes, and5hfor 5 hours.
Example CR file for TrustyAI using a PVC
apiVersion: trustyai.opendatahub.io/v1alpha1 kind: TrustyAIService metadata: name: trustyai-service1 spec: storage: format: "PVC"2 folder: "/inputs"3 size: "1Gi"4 data: filename: "data.csv"5 format: "CSV"6 metrics: schedule: "5s"7 batchSize: 50008 - 1
- The name of the TrustyAI service instance.
- 2
- The storage format for the data, either
DATABASEorPVC(persistent volume claim). - 3
- The location within the PVC where you want to store the data.
- 4
- The size of the PVC to request.
- 5
- The suffix for the stored data files.
- 6
- The format of the data. Currently, only comma-separated value (CSV) format is supported.
- 7
- The interval at which to calculate the metrics. The default is
5s. The duration is specified with the ISO-8601 format. For example,5sfor 5 seconds,5mfor 5 minutes, and5hfor 5 hours. - 8
- (Optional) The observation’s historical window size to use for metrics calculation. The default is
5000, which means that the metrics are calculated using the 5,000 latest inferences.
Add the TrustyAI service’s CR to your project:
oc apply -f trustyai_crd.yamlThis command returns output similar to the following:
trusty-service created
Verification
Verify that you installed the TrustyAI service:
oc get pods | grep trustyai
You should see a response similar to the following:
trustyai-service-5d45b5884f-96h5z 1/1 Running