Chapter 6. Building the virtual database
We now have a working postgreSQL database with some sample data in it. The next step is to define a virtual database that can read and write from the postgreSQL data source. We’ll also configure the REST data source that we mentioned earlier, an online live stock quote service.
The design of a virtual database is defined in a custom resource (CR) file. The CR is written as a YAML file, which contains a section of embedded DDL. After the CR YAML file is ready, you provide it as input to the Data Virtualization Operator, and the Operator deploys a virtual database based on the specification.
This tutorial uses a sample CR file, portfolio.yaml
, that is available from the Teiid OpenShift examples repository that you cloned earlier. If you want to get started right away, you can use the sample file as-is, and run the Operator. Just skip to Chapter 7, Deploying the virtual database to try it out.
Afterwards you can return here to review the next few sections to learn about how the CR is constructed.
6.1. Virtual database custom resources
To specify the provisioning instructions for a virtual database, you create a custom resource (CR) file that describes how to build the virtual database. The CR is a YAML file that defines the following elements of the virtual database:
- The configuration of the data sources.
- The SQL DDL specifies the structure of the database schema.
6.1.1. Data source configuration
For this tutorial the CR defines the following data sources:
accountdb
- Represents the postgreSQL database that you created earlier.
quotesvc
- Represents a REST API for retrieving stock quotes from the online quote service at https://finnhub.io/api/v1/
You must use lowercase letters to specify data source names.
The following CR excerpt specifies the data sources for the virtual database:
apiVersion: teiid.io/v1alpha1 kind: VirtualDatabase metadata: name: portfolio 1 spec: replicas: 1 datasources: - name: accountdb 2 type: postgresql properties: - name: username value: user - name: password value: changeit - name: jdbc-url value: jdbc:postgresql://accounts/accounts - name: quotesvc 3 type: rest properties: - name: endpoint value: https://finnhub.io/api/v1/
- 1
- Defines the resource name for the virtual database when it is deployed to OpenShift.
- 2
- Represents the postgreSQL database and assigns to it the name
accountdb
. - 3
- Represents a REST connection to the online stock quote service at https://finnhub.io/api/v1/, and assigns to it the name
quotesvc
.
The stock quote service that serves as a data source in this tutorial is intended to help demonstrate how virtual databases can integrate data from REST services. The service is not a component of Red Hat Integration data virtualization and is not affiliated with Red Hat in any way.
To use the stock quote service, visit the web site to request a free API key. After you obtain your key, add its value to the portfolio.yaml
to the CR file that you obtain from the Teiid OpenShift Examples repository. Add your API key in place of the STOCK_QUOTE_API_KEY variable in the file.
6.1.2. DDL for defining the virtual database
We embed SQL DDL in our CR to define the structure of the virtual database. For example, the DDL specifies the following properties:
- The translators to use for standardizing the format of the source data.
- The source schema to use for representing data elements in the data sources.
- The metadata (tables, procedures, and so forth) that we want to import from the data source.
- The virtual schema to use to map elements in the data sources to elements in the virtual database.
The next few sections provide more detail about how and where to define these properties in the DDL. After reviewing the individual sections of the file we’ll look at how they are assembled in the final CR.
After the DDL is complete, we insert it into a section of the CR, as shown in the following example:
spec:
build:
source:
ddl: |
PLACEHOLDER_FOR_VIRTUAL_DATABASE_DDL
6.1.2.1. Virtual database creation
The CREATE DATABASE
and USE DATABASE
statements specify the name of the virtual database and indicate that subsequent SQL operations in the DDL apply to this database.
The DDL in our CR uses the following statements to indicate that we’ll create and use a database called Portfolio for our virtual database.
CREATE DATABASE Portfolio OPTIONS (ANNOTATION 'The Portfolio VDB'); USE DATABASE Portfolio;
In this case, the CREATE DATABASE
statement assigns a database name (Portfolio
) that matches the name portfolio
that we assigned to the metadata:name
property in the CR:
apiVersion: teiid.io/v1alpha1 kind: VirtualDatabase metadata: name: portfolio
While it is not required that the two names match, using the same name makes it easier to understand that the two entities are related.
The annotation in the CREATE DATABASE
statement assigns a metadata description to the database. The metadata is purely descriptive and it could be used as a label in a user interface. It is not directly used during the virtual database build or deployment process.
6.1.2.2. Translator definition
A virtual database uses adapters, known as translators or data wrappers, to convert the data that it receives from different types of external data sources into a standard format. Data virtualization uses a wide range of data wrappers. The DDL for our virtual database uses the following SQL statements to define data wrappers for the postgreSQL and REST data sources that we’re using in this tutorial:
CREATE FOREIGN DATA WRAPPER rest; CREATE FOREIGN DATA WRAPPER postgresql;
6.1.2.3. External data source definitions
Our virtual database connects to external two data sources to retrieve data. In the preceding sections, we defined the translators and the connection properties for these data sources. Now we’ll add the DDL to define two SERVER
elements to represent the connections to the data sources. The names that we assign to the SERVER
objects tie them to the datasource
properties that we specified earlier.
The SQL in this section also defines a FOREIGN DATA WRAPPER
(FDW) for each connection. We add these foreign data wrappers to enable the virtual database to manage the data that is stored on these external servers. By defining these FDWs for the virtual database we enable it to query the postgreSQL and REST data sources as if they were local tables.
CREATE SERVER "accountdb" FOREIGN DATA WRAPPER postgresql; CREATE SERVER "quotesvc" FOREIGN DATA WRAPPER rest;
6.1.2.4. Schema creation
A virtual database is a collection of multiple schemas. These schemas can be physical, representing a foreign data source such as our postgreSQL database, or they can be virtual as with the schema that provides in the data virtualization layer.
CREATE SCHEMA marketdata SERVER "quotesvc"; CREATE SCHEMA accounts SERVER "accountdb"; CREATE VIRTUAL SCHEMA Portfolio;
Physical schema such as the marketdata
and accounts
schema that we define here, represent data elements in the external data sources. The physical schema definitions in the preceding DDL excerpt refer to the SERVER
objects that we defined for those external data sources in Section 6.1.2.3, “External data source definitions”. The final statement in the preceding excerpt establishes a virtual schema for our Portfolio virtual database.
6.1.2.5. Metadata import
In Section 6.1.2.3, “External data source definitions” we defined the foreign schema and associated them with their corresponding SERVER
objects. However, we have not provided the data virtualization service with the information that it needs to work with the schema in the foreign data sources. To provide that information, we add the following statements to our DDL:
SET SCHEMA marketdata; IMPORT FROM SERVER "quotesvc" INTO marketdata; SET SCHEMA accounts; IMPORT FROM SERVER "accountdb" INTO accounts OPTIONS ( "importer.useFullSchemaName" 'false', "importer.tableTypes" 'TABLE,VIEW');
In the preceding DDL excerpt we import metadata from our data sources into the schema that we created in Section 6.1.2.4, “Schema creation”.
-
The metadata from the
quotesvc
data source server is imported into themarketdata
foreign schema. -
The metadata from the
accountdb
data source server is imported into theaccounts
foreign schema.
You can specify the full metadata for tables, procedures, and so forth. However, for the purposes of this tutorial, we will dynamically import that information.
6.1.2.6. Virtual schema definition
The final section of the DDL defines the data abstraction layer of the virtual database. Here we define a schema that uses the elements of the metadata that we defined in the preceding PHYSICAL schema. This virtual schema is where the heavy lifting occurs. It defines the logic for combining information from our source tables.
You can define any number of virtual schema layers, but for simplicity, in this example we create a single layer.
In this final portion of the DDL, we create the following virtual views:
StockPrice
-
This view retrieves stock prices from the
quotessvc
, our REST service that provides real-time stock quotes. AccountValues
-
This view uses the value obtained from the
StockPrice
view to calculate the portfolio values of the customers listed in theAccounts
postgreSQL database. CustomerHolding
- This view shows the value of individual customer accounts based on their stock holdings.
SET SCHEMA Portfolio;
CREATE VIEW StockPrice (
symbol string PRIMARY KEY,
price double,
CONSTRAINT ACS ACCESSPATTERN (symbol)
) AS
SELECT p.symbol, y.price
FROM accounts.PRODUCT as p, TABLE(call invokeHttp(action=>'GET', endpoint=>QUERYSTRING('quote', p.symbol as "symbol", `STOCK_QUOTE_API_KEY` as "token"), headers=>jsonObject('application/json' as "Content-Type"))) as x,
JSONTABLE(JSONPARSE(x.result,true), '$' COLUMNS price double path '@.c') as y
CREATE VIEW AccountValues (
LastName string PRIMARY KEY,
FirstName string,
StockValue double
) AS
SELECT c.lastname as LastName, c.firstname as FirstName, sum((h.shares_count*sp.price)) as StockValue
FROM Customer c JOIN Account a on c.SSN=a.SSN
JOIN Holdings h on a.account_id = h.account_id
JOIN product p on h.product_id=p.id
JOIN StockPrice sp on sp.symbol = p.symbol
WHERE a.type='Active'
GROUP BY c.lastname, c.firstname;
CREATE VIEW CustomerHoldings (
LastName string PRIMARY KEY,
FirstName string,
symbol string,
ShareCount integer,
StockValue double,
CONSTRAINT ACS ACCESSPATTERN (LastName)
) AS
SELECT c.lastname as LastName, c.firstname as FirstName, p.symbol as symbol, h.shares_count as ShareCount, (h.shares_count*sp.price) as StockValue
FROM Customer c JOIN Account a on c.SSN=a.SSN
JOIN Holdings h on a.account_id = h.account_id
JOIN product p on h.product_id=p.id
JOIN StockPrice sp on sp.symbol = p.symbol
WHERE a.type='Active';
That completes our tour of the virtual database CR. See Section 6.2, “Completed virtual database custom resource file” to review the CR in its entirety.
6.2. Completed virtual database custom resource file
Bringing together the individual sections of the custom resource file that we reviewed in the preceding sections, we can now review the complete portfolio.yaml
CR file:
portfolio.yaml
apiVersion: teiid.io/v1alpha1 kind: VirtualDatabase metadata: name: portfolio spec: replicas: 1 datasources: - name: accountdb type: postgresql properties: - name: username value: user - name: password value: password - name: jdbc-url value: jdbc:postgresql://accounts/accounts - name: quotesvc type: rest properties: - name: endpoint value: https://finnhub.io/api/v1/ build: source: ddl: | CREATE DATABASE Portfolio OPTIONS (ANNOTATION 'The Portfolio VDB'); USE DATABASE Portfolio; -- translators CREATE FOREIGN DATA WRAPPER rest; CREATE FOREIGN DATA WRAPPER postgresql; -- Servers CREATE SERVER "accountdb" FOREIGN DATA WRAPPER postgresql; CREATE SERVER "quotesvc" FOREIGN DATA WRAPPER rest; -- Schemas CREATE SCHEMA marketdata SERVER "quotesvc"; CREATE SCHEMA accounts SERVER "accountdb"; CREATE VIRTUAL SCHEMA Portfolio; -- Schema:marketdata SET SCHEMA marketdata; IMPORT FROM SERVER "quotesvc" INTO marketdata; -- Schema:accounts SET SCHEMA accounts; IMPORT FROM SERVER "accountdb" INTO accounts OPTIONS ( "importer.useFullSchemaName" 'false', "importer.tableTypes" 'TABLE,VIEW'); -- Schema:Portfolio SET SCHEMA Portfolio; CREATE VIEW StockPrice ( symbol string, price double, CONSTRAINT ACS ACCESSPATTERN (symbol) ) AS SELECT p.symbol, y.price FROM accounts.PRODUCT as p, TABLE(call invokeHttp(action=>'GET', endpoint=>QUERYSTRING('quote', p.symbol as "symbol", `STOCK_QUOTE_API_KEY` as "token"), headers=>jsonObject('application/json' as "Content-Type"))) as x, JSONTABLE(JSONPARSE(x.result,true), '$' COLUMNS price double path '@.c') as y CREATE VIEW AccountValues ( LastName string PRIMARY KEY, FirstName string, StockValue double ) AS SELECT c.lastname as LastName, c.firstname as FirstName, sum((h.shares_count*sp.price)) as StockValue FROM Customer c JOIN Account a on c.SSN=a.SSN JOIN Holdings h on a.account_id = h.account_id JOIN product p on h.product_id=p.id JOIN StockPrice sp on sp.symbol = p.symbol WHERE a.type='Active' GROUP BY c.lastname, c.firstname;
We can now provide the CR to the Data Virtualization Operator to build and deploy the virtual database.