Chapter 3. Data virtualization operator
The data-virtualization operator helps to automate the configuration and deployment of virtual databases. After you add the operator to your OpenShift cluster, you can use it to build and deploy virtual database images from a range of data sources.
In this Technology Preview release, the preferred method for deploying virtual databases to OpenShift is to use the data-virtualization operator. The method of using creating and deploying virtual databases with Fabric8 and the Spring Boot Maven plugin is deprecated. Currently, virtual databases that you create from the data virtualization operator are not available in Fuse Online.
You can install the data virtualization operator on OpenShift 3.11 or greater. On OpenShift 4.2 and later the Operator is available in the OperatorHub.
Prerequisites
- You are a cluster administrator. You must have cluster-admin rights to add operators to the OpenShift cluster.
- You have the 2019-12 release of Red Hat Integration.
- You have Developer access to an OpenShift server and you are familiar with using the OpenShift console and CLI.
- You have a DDL file for the virtual database that you want to create, or you know how to write SQL code and create DDL files.
- You have Maven 3.0 installed.
- You have JDK 8 (Java Platform, Standard Edition 8 Development Kit) or greater installed.
You can connect to one of the following types of data sources:
- Relational database
- mongoDB
- Salesforce
- REST
- OData
- OpenAPI
- You have administrator access to the configured data source.
3.1. Installing the data virtualization operator on OpenShift
Install the data virtualization operator so that you can use it to deploy virtual database images to OpenShift from YAML-based custom resources (CRs).
Prerequisites
- You have cluster-admin access to an OpenShift 3.11 or 4.2 or greater cluster.
-
You can use the
oc
command-line tool to connect to interact with your OpenShift 3.11 cluster, or you have access to the OpenShift 4.2 or greater web console.
Procedure
- Install the operator using one of the following methods, depending on the version of OpenShift that you are running.
Installing on OpenShift 3.11
From a terminal window, log in to the OpenShift cluster as a cluster administrator.
oc login
- Create or open a project where you want to deploy a virtual database.
Type the following commands:
export OP_ROOT=https://raw.githubusercontent.com/teiid/teiid-operator/7.5-0.0.x/deploy oc create -f $OP_ROOT/crds/virtualdatabase.crd.yaml 1 oc create -f $OP_ROOT/service_account.yaml oc create -f $OP_ROOT/role.yaml oc create -f $OP_ROOT/role_binding.yaml oc create -f $OP_ROOT/operator.yaml
- 1
- If you previously created a CRD in the cluster, the command returns an error, reporting that the CRD already exists. You can ignore the message.
Type the following commands to create a pull secret that you can use to access the Red Hat image registry:
oc create secret docker-registry dv-pull-secret / --docker-server=registry.redhat.io / --docker-username=$username / 1 --docker-password=$password / --docker-email=$email_address oc secrets link builder dv-pull-secret oc secrets link builder dv-pull-secret --for=pull
- 1
- Substitute the user name and password that you use to log in to the Red Hat Customer Portal.
If the command completes with no errors, the operator is deployed to your OpenShift instance within the current OpenShift project.
To enable the data virtualization operator to retrieve images from the Red Hat registry so that you can create virtual databases, link the secret that you created in Step 4 to the service account for the operator.
oc secrets link dv-operator dv-pull-secret --for=pull
Installing on OpenShift 4.2 or greater
From a terminal window, type the following commands to log in to the OpenShift cluster and create a pull secret that you can use to access the Red Hat image registry:
oc login oc create secret docker-registry dv-pull-secret / --docker-server=registry.redhat.io / --docker-username=$username / 1 --docker-password=$password / --docker-email=$email_address oc secrets link builder dv-pull-secret oc secrets link builder dv-pull-secret --for=pull
- 1
- Use your Red Hat Customer Portal login credentials.
- Log in to the OpenShift web console as a cluster administrator.
- From the OpenShift menu, expand Operators and click OperatorHub.
- Click Data Virtualization Operator 7.5.0 provided by Red Hat, Inc., and then click Install.
From the Create Operator Subscription page, verify that the selected namespace matches the name of the project where you want to install the operator, and then click Subscribe.
The Installed Operators page lists the Data Virtualization Operator and reports the status of the installation.
- From the OpenShift menu, expand Workloads and click Pods to check the status of the operator pod. After a few minutes, the pod for the operator service begins to run.
To enable the data virtualization operator to retrieve images from the Red Hat registry so that you can create virtual databases, link the secret that you created in Step 1 to the service account for the operator.
oc secrets link dv-operator dv-pull-secret --for=pull
Additional resources
3.2. Running the data virtualization operator to deploy a virtual database
After a cluster administrator adds the data virtualization operator to an OpenShift cluster, other users can run the operator to create a virtual database.
The data virtualization operator processes a virtual database custom resource (CR) to create and deploy a virtual database object on OpenShift. You specify the configuration properties for the data source in the CR. Each type of data source requires a specific set of configuration properties. When you run the operator, you provide it with a .yaml
file that contains the CR. By running the operator with different CRs, you can create virtual databases from a range of data sources. For more information about specifying data source properties, see Section 3.4, “Custom resources to support data sources”
In this Technology Preview, the data virtualization operator can create virtual databases from the following data sources only:
- Relational databases
- Salesforce databases
- MongoDB
- Open API
- OData
Prerequisites
- You have access to an OpenShift cluster in which the data virtualization operator is installed.
- The operator has access to the Maven repositories that contain the dependencies that the build requires.
- OpenShift can access a supported data source that runs on your network.
- You have the login credentials to access the data source.
-
You have a CR in
.yaml
format that provides information about how to create the virtual database.
Procedure
- From a terminal window, log in to OpenShift and open the project where you want to create the virtual database.
-
On you computer, change to the directory that contains the
.yaml
file that contains the CR. Type the following command to run the operator to create the virtual database:
oc create -f <cr_filename.yaml>
Replace
<cr_filename.yaml>
with the name of the CR file for your data source. For example,oc create -f dv-customer.yaml
After the deployment completes, a service is added to the OpenShift cluster. The name of the service matches the name of the custom resource.
Type the following command to verify that the virtual database is created:
oc get vdbs
OpenShift returns the list of virtual databases in the project.
The deployed service supports connections from the following clients:
- JDBC clients through port 31000.
- postgreSQL clients, including ODBC clients, through port 35432.
- OData clients, through an HTTP endpoint and route.
For OpenShift to create an HTTP endpoint, the value of the property spec/exposeVia3scale
must be set to false
in the CR. If the value is set to true
it is assumed that 3scale manages the endpoint, and no HTTP endpoint is created.
3.3. Creating a virtual database from the OpenShift web console
After a cluster administrator adds the data virtualization operator from the OperatorHub, other users can run the operator from web console to create a virtual database.
When you run the operator from the web console, you can use the embedded editor to create the custom resource (CR) that defines the virtual database. The CR includes properties that specify how to connect to the data source, and a DDL section that determines how information in the data source is used in your virtual database. You use YAML or JSON to define connection properties, and you use SQL-MED in the DDL section to specify the virtual database schema.
The editor includes a sample CR that you can edit in-place, or download to edit it off-line. You can also use drag and drop to upload local copies of properties definitions or DDL files. For information about the properties to use for different data sources, see Section 3.4, “Custom resources to support data sources”
In this Technology Preview, the data virtualization operator can create virtual databases from the following data sources only:
- Relational databases
- Salesforce databases
- MongoDB
- Open API
- OData
Prerequisites
- You have web console access to an OpenShift cluster in which the data virtualization operator is installed.
- The operator has access to the Maven repositories that contain the dependencies that the build requires.
- OpenShift can access a supported data source that runs on your network.
- You have the login credentials to access the data source.
-
You have a CR in
.yaml
format that provides information about how to create the virtual database.
Procedure
- From the OpenShift menu, expand Operators and click Installed Operators.
- Click Data Virtualization Operator.
- Click the VirtualDatabase tab.
Click Create Virtual Database.
OpenShift displays a sample custom resource.
Edit the sample CR as needed to specify the details of the virtualization, and then click Create.
OpenShift lists the virtual database service on the VirtualDatabase tab.
- From the OpenShift menu, expand Workloads and click Pods to check the status of the pod that runs the new virtualization. After it is running, you are ready to use the new virtualization.
To view the protocols that the virtualization exposes, from the Openshift menu, expand Networking, click Services, and then click the service with the name of the virtualization.
The deployed service automatically supports the following connection types:
- JDBC - Provides access for SQL clients over port 31000.
- postgreSQL - Provides access for ODBC clients and other postgreSQL clients over port 35432.
- HTTP - Provides access for OData and REST clients over port 8080.
3.4. Custom resources to support data sources
Before you can use the data virtualization operator to create a virtual database, you must specify properties for the data source in a custom resource (CR) file.
By providing configuration information in a CR, you provide the operator with directions for how to create virtual databases from any of multiple types of data sources. When you run the data virtualization operator, it reads information from the CR to determine the type of the data source, its schema structure, and how to connect to it and authenticate with it.
The CR uses SQL Data Definition Language (DDL) commands to describe the schema of the virtual database and the data source, the data that you want to import into the virtual database, and the mapping between the source schema and the virtual schema. The CR also specifies the translator to use to convert the format of commands and data that pass between the virtual database and the data source.
You can specify the values of configuration properties directly in the CR file, or you can reference values that you define in OpenShift secrets. For more information about creating secrets, see Section 4.2, “Creating and deploying secrets”.
Period characters (.
) are not valid for use in environment variables. When you add variables to the CR, replace period characters with underscore characters (_
). At runtime, underscores in the variables are converted automatically to periods.
3.4.1. Settings to configure relational databases as data sources
The custom resource (CR) for a relational database management system (RDBMS), such as PostgreSQL or Microsoft SQL Server, must contains specific information so that the data virtualization operator can create a virtual database from the database.
The following tables list some of the properties that are required in a CR to create a virtual database that is based on a relational database. The values that you assign to these properties vary according to your environment and the specific database technology that you use.
Property Name | Description | Required | Default value |
---|---|---|---|
SPRING_DATASOURCE_<NAME>_JDBCURL | URL For the connection | Yes | n/a |
SPRING_DATASOURCE_<NAME>_USERNAME | User Name | Yes | n/a |
SPRING_DATASOURCE_<NAME>_PASSWORD | Password | Yes | n/a |
SPRING_DATASOURCE_<NAME>_DRIVER_CLASS_NAME∗ | Driver Name | No | n/a |
SPRING_DATASOURCE_<NAME>_IMPORTER_SCHEMA_NAME | Schema Name for import | Yes | n/a |
In the preceding table, NAME specifies a name, in upper case, that refers to the data source. This same name string is used in the DDL that defines the virtual database to represent the data source server.
∗The JDBC driver class that you reference must be available as a Maven artifact from a repository that is listed in the spec/dependencies
section of the CR file. The virtualization operator must have access to download artifacts from the specified repository.
Along with the connection properties You can define properties to control the behavior of the JDBC translator and to provide more control over how you import data from the source database.
Property Name | Description | Required | Value |
---|---|---|---|
FOREIGN DATA WRAPPER | Translator | Yes | <translator-name>∗ |
∗The translator value depends on the type of database. Translators are available for the following relational databases.
- db2
- derby
- h2
- hana (Connects to SAP HANA through a JDBC driver)
- hive*jdbc (Connects to Apache Hive through the Hive JDBC driver)
- hsql
- impala (Connects to Apache Impala)
- informix
- ingres
- jtds (Connects to Microsoft SQL Server through Java tabular data stream (JTDS) drivers)
- mysql5
- netezza
- oracle
- osisoft-pi (Connects to an OSIsoft PI Data Archive through the OSIsoft JDBC driver)
- phoenix (Connects to Apache Phoenix using the Phoenix JDBC driver)
- postgresql
- presto-jdbc (Connects to PrestoDB through a JDBC driver)
- sqlserver (Connects to Microsoft SQL Server)
- sybase
- teradata
- teiid (Connecting to a Teiid virtual database through a JDBC driver)
- vertica
For a complete list of the translator properties that you can define for JDBC data sources, see JDBC Translators in the Teiid Reference Guide.
3.4.2. Settings to configure MongoDB as a data source
The custom resource (CR) for a MongoDB database must contain specific information so that the data virtualization operator can create a virtual database from the database.
The current release supports MongoDB release 3.
The following tables list the properties that are required in the CR to create a virtual database that is based on a MongoDB database:
Property Name | Description | Required | Default value |
---|---|---|---|
SPRING_TEIID_DATA_MONGODB_{NAME}_REMOTE_SERVER_LIST | List of MongoDb servers ex:(localhost:27012) | Yes | n/a |
SPRING_TEIID_DATA_MONGODB_{NAME}_USER | User Name | Yes | n/a |
SPRING_TEIID_DATA_MONGODB_{NAME}_PASSWORD | Password | Yes | n/a |
SPRING_TEIID_DATA_MONGODB_{NAME}_DATABASE | Database name to connect to | Yes | n/a |
SPRING_TEIID_DATA_MONGODB_{NAME}_AUTH_DATABASE | Database name for authorization | No | n/a |
SPRING_TEIID_DATA_MONGODB_{NAME}_SSL | Use SSL Connection | No | n/a |
In the preceding table, NAME specifies a name, in upper case, that refers to the data source. This same name string is used in the DDL that defines the virtual database to represent the data source server.
Property Name | Description | Required | Value |
---|---|---|---|
FOREIGN DATA WRAPPER | Translator | Yes | mongodb |
For a complete list of the properties that you can set to control how data is translated between MongoDB and a virtual database, see the Teiid Reference Guide.
You are not required to list any build dependencies in the CR for a MongoDB virtual database.
3.4.3. Settings to configure REST, OData, and OpenAPI data sources
The custom resource (CR) for a REST-based data source must contain specific information so that the data virtualization operator can create a virtual database from the source.
The set of connection properties that are required for all REST-based data sources is the same. However, services that are based on standard specifications, such as OData or OpenAPI, require specific translators. For information about translator properties, see Table 3.6, “Foreign data wrapper settings in the build/source/ddl
section of a REST-based CR”.
For generic REST web services that are not based on a standard specification, the data virtualization service cannot automatically convert query criteria into a query parameter. Because these services lack built-in mechanisms to pass SQL query conditions to a REST API endpoint, you must use the invokeHttp
procedure to pass the query as an XML or JSON payload, and you must specify all query strings and headers.
By default, translators are unable to parse the security configuration of a secured API. To enable translators to access data for a secured API, the CR must specify the security properties for the API.
Property Name | Description | Required | Default value |
---|---|---|---|
SPRING_TEIID_REST_<NAME>_ENDPOINT | Endpoint for the service | Yes | n/a |
SPRING_TEIID_REST_<NAME>_SECURITY_TYPE | Security type used. Available options are http-basic, openid-connect or empty | No | no security |
SPRING_TEIID_REST_<NAME>_USER_NAME | User Name | Yes | n/a |
SPRING_TEIID_REST_<NAME>_PASSWORD | Password | Yes | n/a |
SPRING_TEIID_REST_<NAME>_CLIENT_ID | ClientId from connected app | Yes, when the security type is defined as openid-connect | n/a |
SPRING_TEIID_REST_<NAME>_CLIENT_SECRET | clientSecret from connected app | Yes, when the security type is defined as openid-connect | n/a |
SPRING_TEIID_REST_<NAME>_AUTHORIZE_URL | clientSecret from connected app | Yes, when the security type is defined as openid-connect | n/a |
SPRING_TEIID_REST_<NAME>_ACCESS_TOKEN_URL | clientSecret from connected app | Yes, when the security type is defined as openid-connect | n/a |
SPRING_TEIID_REST_<NAME>_SCOPE | clientSecret from connected app | No. Applies when the security type is defined as openid-connect | n/a |
SPRING_TEIID_REST_<NAME>_REFRESH_TOKEN∗ | Refresh token for the virtual database service | No | n/a |
∗ When security is set to openid-connect, you can use refresh tokens to authenticate, instead of using name and password authentication. Information about obtaining refresh tokens is beyond the scope of this document.
In the Table 3.5, “Properties in the spec/env
section of REST-based CRs”, NAME indicates the name, in upper case, that represents the data source. The NAME value is also used in lower case in the build/source/ddl
section of the CR to represent the data source server.
Data source | Description | Required | Value |
---|---|---|---|
REST | Translator | Yes | rest |
OData | Translator | Yes | odata |
OData4 | Translator | Yes | odata4 |
OpenAPI | Translator | Yes | openapi |
For a complete list of the properties that you can set to control how data is translated between REST-based services and a virtual database, see the OData, OData V4, OpenAPI, and Web Services Translator sections in the Translators section of the Teiid Reference Guide.
The OpenAPI translator assumes that the endpoint in the API document is set to the target location /openapi
, and it builds a source model that is based on that assumption. If the API endpoint is set to a different target, a configuration setting must be specified to enable the translator to locate the endpoint and import data correctly. The following examples show a DDL schema statement and an environment variable that you can set to specify a non-standard endpoint.
Sample DDL statement to specify a non-standard OpenAPI endpoint
CREATE SCHEMA sourceModel SERVER oService OPTIONS ("importer.metadataUrl" '/swagger.json');
Sample environment property to specify a non-standard OpenAPI endpoint
SPRING_TEIID_REST_OSERVICE_IMPORTER_METADATAURL=/swagger.json
3.4.4. Settings to configure Salesforce as a data source
The custom resource (CR) for a Salesforce database must contain specific information so that the data virtualization operator can create a virtual database from the database.
Salesforce uses OAuth 2.0 for authentication and authorization. Before a virtual database can import and query Salesforce data, you must obtain OAuth credentials for the virtual database from Salesforce. For information about how to set up OAuth, see Section 3.4.5, “Setting up an OAuth connection to Salesforce”
The following tables list the properties that are required in the CR to create a virtual database that is based on a Salesforce database:
Property Name | Description | Required | Default value |
---|---|---|---|
SPRING_TEIID_DATA_SALESFORCE_<NAME>_USER_NAME∗ | User for salesforce.com | Yes | n/a |
SPRING_TEIID_DATA_SALESFORCE_<NAME>_PASSWORD∗ | Password for salesforce.com | Yes | n/a |
SPRING_TEIID_DATA_SALESFORCE_<NAME>_CLIENT_ID | ClientId from connected app | Yes | n/a |
SPRING_TEIID_DATA_SALESFORCE_<NAME>_CLIENT_SECRET | clientSecret from connected app | No | n/a |
SPRING_TEIID_DATA_SALESFORCE_<NAME>_REFRESH_TOKEN∗ | Refresh token for connected app | No | n/a |
∗If your connected app uses refresh tokens to authenticate, rather than name and password, your CR must specify the refresh token property, in place of the user name and password properties. Information about obtaining refresh tokens is beyond the scope of this document. For information about how to obtain a refresh token for your connected app, see the Salesforce documentation.
In Table 1, NAME indicates the name, in upper case, that refers to the data source. The NAME value is also used in lower case in the build/source/ddl
section of the CR to represent the data source server.
Property Name | Description | Required | Value |
---|---|---|---|
FOREIGN DATA WRAPPER | Translator | Yes | salesforce |
For a complete list of the properties that you can set to control how data is translated between Salesforce and a virtual database, see the Teiid Reference Guide.
3.4.5. Setting up an OAuth connection to Salesforce
Before the data virtualization service can retrieve data from a Salesforce database, you must enable configure it as a connected app in Salesforce that is OAuth-enabled. After you configure OAuth, Salesforce generates a client ID and client secret that you must add to the CR file that defines the connection from the virtual database to Salesforce.
To configure OAuth you create a connected app in Salesforce that can request access to Salesforce data on behalf of the data virtualization service. In the settings for the connected app, you enable integration with the Salesforce API by using the OAuth 2.0.
Prerequisites
- You have a Salesforce.com account that has access to the data that you want to integrate in a virtual database.
The following steps are based on Salesforce Classic. If you use a different version of Salesforce, you might use a different procedure. For more information about creating connected apps in Salesforce, see the Salesforce documentation.
Procedure
- From Salesforce, log into your account.
- Click SetUp in the profile menu.
- In the Build section of the navigation sidebar, expand Create, and then click Apps.
- In the Connected Apps section, click New.
- Complete the required fields.
- In the section API (Enable OAuth Settings), select Enable OAuth Settings to display the OAuth settings.
Complete the required OAuth fields. In the OAuth Scopes field, you must select the following scopes:
- Access and manage your data (api).
- Access your basic information (id, profile, email, address, phone).
- Allow access to your unique identifier (openid).
- Full access (full).
- Perform requests on your behalf at any time (refresh_token, offline_access).
- Select Require Secret for Web Server Flow.
- Click Save and then click Continue.
- Make a note of the values in the Consumer Key and Consumer Secret fields. These values are required for properties in the CR that specifies how the virtual database connects to Salesforce.