Chapter 3. Defining machine learning features

3.1. Setting up your working environment
Copiar enlace

You must set up your Red Hat OpenShift AI working environment so that you can use features in your machine learning workflow.

Prerequisites

You have access to the OpenShift AI data science project in which your cluster administrator has set up the Feature Store instance.

Procedure

From the OpenShift AI dashboard, click Data science projects.
Click the name of the project in which your cluster administrator has set up the Feature Store instance.
In the data science project in which the cluster administrator set up Feature Store, create a workbench, as described in Creating a workbench.
To open the IDE (for example, JuypterLab), in a new window, click the open icon ( ) next to the workbench.
Add a feature_store.yaml file to your notebook environment. For example, upload a local file or clone a Git repo that contains the file, as described in Uploading an existing notebook file to JupyterLab from a Git repository by using the CLI.
Open a new Python notebook.
In a cell, run the following command to install the feast CLI:
```
! pip install feast
```
```
! pip install feast
```
Copy to Clipboard Toggle word wrap

Verification

Run the following command to list the available features:

! feast features list

! feast features list

Copy to Clipboard

Toggle word wrap

The output should show a list of features, Feature View and data type similar to the following:

Feature	Feature         View          Data Type
credit_card_due	        credit_history			Int64
mortgage_due	        credit_history			Int64
student_loan_due	    credit_history			Int64
vehicle_loan_due	    credit_history			Int64
city			        zipcode_features		String
state			        zipcode_features		String
location_type		    zipcode_features		String

Feature	Feature         View          Data Type
credit_card_due	        credit_history			Int64
mortgage_due	        credit_history			Int64
student_loan_due	    credit_history			Int64
vehicle_loan_due	    credit_history			Int64
city			        zipcode_features		String
state			        zipcode_features		String
location_type		    zipcode_features		String

Copy to Clipboard

Toggle word wrap

Optionally, run the following commands to list the registered feast projects, feature views, and entities.
```
! feast projects list

! feast feature-views list

! feast entities list
```
```
! feast projects list

! feast feature-views list

! feast entities list
```
Copy to Clipboard Toggle word wrap

3.2. About feature definitions
Copiar enlace

A machine learning feature is a measurable property or field within a data set that a machine learning model can analyze to learn patterns and make decisions. In Feature Store, you define a feature by defining the name and data type of a field.

A feature definition is a schema that includes the field name and data type, as shown in the following example:

from feast import Field
from feast.types import Int64

credit_card_amount_due = Field(
    name="credit_card_amount_due",
    dtype=Int64
)

from feast import Field
from feast.types import Int64

credit_card_amount_due = Field(
    name="credit_card_amount_due",
    dtype=Int64
)

Copy to Clipboard

Toggle word wrap

For a list of supported data types for fields in Feature Store, see the feast.types module in the Feast documentation.

In addition to field name and data type, a feature definition can include additional metadata, specified as descriptions of features, as shown in the following example:

from feast import Field
from feast.types import Int64


credit_card_amount_due = Field(
    name="credit_card_amount_due",
    dtype=Int64,
    description="Credit card amount due for user",
    tags={"team": "loan_department"},
)

from feast import Field
from feast.types import Int64


credit_card_amount_due = Field(
    name="credit_card_amount_due",
    dtype=Int64,
    description="Credit card amount due for user",
    tags={"team": "loan_department"},
)

Copy to Clipboard

Toggle word wrap

3.3. Specifying the data source for features
Copiar enlace

As an ML engineer or a data scientist, you must specify the data source for the features that you want to define.

The data source differs depending on whether you are using an offline store, for batch data and training data sets, or an online store, for model inference. Optionally, you can use a Parquet or a Delta-formatted file as the data source. You can specify a local file or a file in storage, such as Amazon Simple Storage Service (S3).

For offline stores, specify a batch data source. You can specify a data warehouse, such as BigQuery, Snowflake, Redshift, or a data lake, such as Amazon S3 or Google Cloud Platform (GCP). You can use Feature Store to ingest and query data across both types of data sources.

For online stores, specify a database backend, such as Redis, GCP Datastore, or DynamoDB.

Prerequisites

You know the location of the data source for your ML workflow.

Procedure

In the editor of your choice, create a new Python file.

At the beginning of the file, specify the data source for the features that you want to define within the file.

For example, use the following code to specify the data source as a Parquet-formatted file:

from feast import FileSource
from feast.data_format import ParquetFormat

parquet_file_source = FileSource(
    file_format=ParquetFormat(),
    path="file:///feast/customer.parquet",
)

from feast import FileSource
from feast.data_format import ParquetFormat

parquet_file_source = FileSource(
    file_format=ParquetFormat(),
    path="file:///feast/customer.parquet",
)

Copy to Clipboard

Toggle word wrap

Save the file.

3.4. About organizing features by using entities
Copiar enlace

Within a feature view, you can group features that share a conceptual link or relationship together to define an entity. You can think of an entity as a primary key that you can use to fetch features. Typically, an entity maps to the domain of your use case. For example, a fraud detection use case could have customers and transactions as their entities, with group-related features that correspond to these customers and transactions.

A feature does not have to be associated with an entity. For example, a feature of a customer entity could be the number of transactions they have made on an average month, while a feature that is not observed on a specific entity could be the total number of transactions made by all users in the last month.

customer = Entity(name='dob_ssn', join_keys=['dob_ssn'])

customer = Entity(name='dob_ssn', join_keys=['dob_ssn'])

Copy to Clipboard

Toggle word wrap

The entity name uniquely identifies the entity. The join key identifies the physical primary key on which feature values are joined together for feature retrieval.

The following table shows example data with a single entity column (dob_ssn) and two feature columns (credit_card_due and bankruptcies).

Expand

Table 3.1. Example data showing an entity and features
row	timestamp	dob_ssn	credit_card_due	bankruptcies
1	5/22/2025 0:00:00	19530219_5179	833	0
2	5/22/2025 0:00:00	19500806_6783	1297	0
3	5/22/2025 0:00:00	19690214_3370	3912	1
4	5/22/2025 0:00:00	19570513_7405	8840	0

3.5. Creating feature views
Copiar enlace

You define features within a feature view. A feature view is an object that represents a logical group of time-series feature data in a data source. Feature views indicate to Feature Store where to find your feature values, for example, in a parquet file or a BigQuery table.

By using feature views, you define the existing feature data in a consistent way for both an offline environment, when you train your models, and an online environment, when you want to serve features to models in production.

Feature Store uses feature views during the following tasks:

Generating training datasets by querying the data source of feature views to find historical feature values. A single training data set can consist of features from multiple feature views.
Loading feature values into an online or offline store. Feature views determine the storage schema in the online or offline store. Feature values can be loaded from batch sources or from stream sources.
Retrieving features from the online or offline store. Feature views provide the schema definition for looking up features from the online or offline store.

When you create a feature project, the feature_repo subfolder includes a Python file that includes example feature definitions (for example, example_features.py) .

To define new features, you can edit the code in the example file or add a new file to the feature repository.

Note: Feature views only work with timestamped data. If your data does not contain timestamps, insert dummy timestamps. The following example shows how to create a table with dummy timestamps for PostgreSQL-based data:

CREATE TABLE employee_metadata (
  employee_id INT PRIMARY KEY,
  department TEXT,
  dummy_event_timestamp TIMESTAMP DEFAULT '2024-01-01'
);
INSERT INTO employee_metadata (employee_id, department)
VALUES (1, 'Advanced'), (2, 'New');

CREATE TABLE employee_metadata (
  employee_id INT PRIMARY KEY,
  department TEXT,
  dummy_event_timestamp TIMESTAMP DEFAULT '2024-01-01'
);
INSERT INTO employee_metadata (employee_id, department)
VALUES (1, 'Advanced'), (2, 'New');

Copy to Clipboard

Toggle word wrap

Prerequisites

You know what data is relevant to your use case.
You have identified attributes in your data that you want to use as features in your ML models.

Procedure

In your IDE, such as JupyterLab, open the feature_repo/example_features.py file that contains example feature definitions or create a new Python (.py) file in the feature_repo directory.

Create a feature view that is relevant to your use case based on the structure shown in the following example:

credit_history_source = FileSource(   
	name="Credit history",
	path="data/credit_history.parquet",
	file_format=ParquetFormat(),
	timestamp_field="event_timestamp",
	created_timestamp_column="created_timestamp",
)
credit_history = FeatureView(       
	name="credit_history",
	entities=[dob_ssn],             
	ttl=timedelta(days=90),         
	schema=[                        
    	Field(name="credit_card_due", dtype=Int64),
    	Field(name="mortgage_due", dtype=Int64),
    	Field(name="student_loan_due", dtype=Int64),
    	Field(name="vehicle_loan_due", dtype=Int64),
    	Field(name="hard_pulls", dtype=Int64),
    	Field(name="missed_payments_2y", dtype=Int64),
    	Field(name="missed_payments_1y", dtype=Int64),
    	Field(name="missed_payments_6m", dtype=Int64),
    	Field(name="bankruptcies", dtype=Int64),
	],
	source=credit_history_source,  
    tags={"origin": "internet"},   
)

credit_history_source = FileSource(

1


	name="Credit history",
	path="data/credit_history.parquet",
	file_format=ParquetFormat(),
	timestamp_field="event_timestamp",
	created_timestamp_column="created_timestamp",
)
credit_history = FeatureView(

2


	name="credit_history",
	entities=[dob_ssn],

3


	ttl=timedelta(days=90),

4


	schema=[

5


    	Field(name="credit_card_due", dtype=Int64),
    	Field(name="mortgage_due", dtype=Int64),
    	Field(name="student_loan_due", dtype=Int64),
    	Field(name="vehicle_loan_due", dtype=Int64),
    	Field(name="hard_pulls", dtype=Int64),
    	Field(name="missed_payments_2y", dtype=Int64),
    	Field(name="missed_payments_1y", dtype=Int64),
    	Field(name="missed_payments_6m", dtype=Int64),
    	Field(name="bankruptcies", dtype=Int64),
	],
	source=credit_history_source,

6


    tags={"origin": "internet"},

7

Copy to Clipboard

Toggle word wrap

1: A data source that provides time-stamped tabular data. A feature view must always have a data source for the generation of training datasets and when materializing feature values into the online store. Possible data sources are batch data sources from data warehouses (BigQuery, Snowflake, Redshift), data lakes (S3, GCS), or stream sources. Users can push features from data sources into Feature Store, and make the features available for training or batch scoring ("offline"), for realtime feature serving ("online"), or both.
2: A name that identifies the feature view in the project. Within a feature view, feature names must be unique.
3: Zero or more entities. Feature views generally contain features that are properties of a specific object, in which case that object is defined as an entity and included in the feature view. If the features are not related to a specific object, the feature view might not have entities.
4: (Optional) Time-to-live (TTL) to limit how far back to look when Feature Store generates historical datasets.
5: One or more feature definitions.
6: A reference to the data source.
7: (Optional) You can add metadata, such as tags that enable filtering of features when viewing them in the UI, listing them by using a CLI command, or by querying the registry directly.

Save the file.

Este contenido no está disponible en el idioma seleccionado.

3.1. Setting up your working environment
Copiar enlace

3.2. About feature definitions
Copiar enlace

3.3. Specifying the data source for features
Copiar enlace

3.4. About organizing features by using entities
Copiar enlace

3.5. Creating feature views
Copiar enlace

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Este contenido no está disponible en el idioma seleccionado.

Chapter 3. Defining machine learning features

3.1. Setting up your working environmentCopiar enlaceEnlace copiado en el portapapeles!

3.2. About feature definitionsCopiar enlaceEnlace copiado en el portapapeles!

3.3. Specifying the data source for featuresCopiar enlaceEnlace copiado en el portapapeles!

3.4. About organizing features by using entitiesCopiar enlaceEnlace copiado en el portapapeles!

3.5. Creating feature viewsCopiar enlaceEnlace copiado en el portapapeles!

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.1. Setting up your working environment
Copiar enlace

3.2. About feature definitions
Copiar enlace

3.3. Specifying the data source for features
Copiar enlace

3.4. About organizing features by using entities
Copiar enlace

3.5. Creating feature views
Copiar enlace