Chapter 1. Overview of machine learning features and Feature Store
A machine learning (ML) feature is a measurable property or attribute within a dataset that a machine learning model can analyze to learn patterns and make decisions. Examples of features include a customer’s purchase history, demographic data like age and location, weather conditions, and financial market data. You can use these features to train models for tasks such as personalized product recommendations, fraud detection, and predictive maintenance.
Feature Store is a Red Hat OpenShift AI component that provides a centralized repository that stores, manages, and serves machine learning features for both training and inference purposes.
1.1. Audience for Feature Store Copy linkLink copied to clipboard!
The target audience for Feature Store is ML platform and MLOps teams with DevOps experience in deploying real-time models to production. Feature Store also helps these teams build a feature platform that improves collaboration between data engineers, software engineers, machine learning engineers, and data scientists.
- For Data Scientists
- Feature Store is a tool where you can define, store, and retrieve your features for both model development and model deployment. By using Feature Store, you can focus on what you do best: build features that power your AI/ML models and maximize the value of your data.
- For MLOps Engineers
- Feature Store is a library that connects your existing infrastructure, such as online database, application server, microservice, analytical database, and orchestration tooling. By using Feature Store, you can focus on maintaining a resilient system, instead of implementing features for data scientists.
- For Data Engineers
- Feature Store provides a centralized catalog for storing feature definitions, allowing you to maintain a single source of truth for feature data. It provides the abstraction for reading and writing to many different types of offline and online data stores. Using the provided Python SDK or the feature server service, you can write data to the online and offline stores and then read out that data in either batch scenarios for model training or low-latency online scenarios for model inference.
- For AI Engineers
- Feature Store provides a platform designed to scale your AI applications by enabling seamless integration of richer data and facilitating fine-tuning. With Feature Store, you can optimize the performance of your AI models while ensuring a scalable and efficient data pipeline.
1.2. Overview of machine learning features Copy linkLink copied to clipboard!
In machine learning, a feature, also referred to as a field, is an individual measurable property. A feature is used as an input signal to a predictive model. For example, if a bank’s loan department is trying to predict whether an applicant should be approved for a loan, a useful feature might be whether they have filed for bankruptcy in the past or how much credit card debt they currently carry.
| customer_id | avg_cc_balance | credit_score | bankruptcy |
| 1005 | 500.00 | 730 | 0 |
| 982 | 20000.00 | 570 | 2 |
| 1001 | 1400.00 | 600 | 0 |
Features are prepared data that help machine learning models understand patterns in the world. Feature engineering is the process of selecting, manipulating, and transforming raw data into features that can be used in supervised learning. As shown in the table, a feature refers to an entire column in a dataset, for example, credit_score. A feature value refers to a single value in a feature column, such as 730.
1.3. Overview of Feature Store Copy linkLink copied to clipboard!
Feature Store is an OpenShift AI component that provides an interface between models and data. It is based on the Feast open source project. Feature Store provides a framework for storing, managing, and serving features to machine learning models by using your existing infrastructure and data stores. It facilitates the retrieval of feature data from different data sources to generate and manage features by providing unified feature management capabilities.
The following figure shows where Feature Store fits in the ML workflow. In an ML workflow, features are inputs to ML models. The ML workflow starts with many types of relevant data, such as transactional data, customer references, and product data. The data comes from a variety of databases and data sources. From this data, ML engineers use Feature Store to curate features. The features are input to models and the models can then use the data from the features to make predictions.
Figure 1.1. Feature Store in the ML workflow
Feature Store is a machine learning data system that provides the following capabilities:
- Runs data pipelines that transform raw data into feature values
- Stores and manages feature data
- Serves feature data consistently for training and inference purposes
- Manages features consistently across offline and online environments
- Powers one model or thousands simultaneously with fresh, reusable features, on demand
Feature Store is a centralized hub for storing, processing, and accessing commonly-used features that enables users in your ML organization to collaborate. When you register a feature in a Feature Store, it becomes available for immediate reuse by other models across your organization. The Feature Store registry reduces duplication of data engineering efforts and allows new ML projects to bootstrap with a library of curated, production-ready features.
Feature Store provides consistency in model training and inference, promotes collaboration and usability across multiple projects, monitors lineage and versioning of models for data drifts, leaks, and training skews, and seamlessly integrates with other MLOps tools. Feature Store remotely manages data stored in other systems, such as BigQuery, Snowflake, DynamoDB, and Redis, to make features consistently available at training / serving time.
Feature Store performs the following tasks:
- Stores features in offline and online stores
- Registers features in the registry for sharing
- Serves features to ML models
ML platform teams use Feature Store to store and serve features consistently for offline training, such as batch-scoring, and online real-time model inference.
Feature Store consists of the following key components:
- Registry
- A central catalog of all feature definitions and their related metadata. It allows ML engineers and data scientists to search, discover, and collaborate on new features. The registry exposes methods to apply, list, retrieve, and delete features.
- Offline Store
- The data store that contains historical data for scale-out batch scoring or model training. The offline store persists batch data that has been ingested into Feature Store. This data is used for producing training datasets. Examples of offline stores include Dask, Snowflake, BigQuery, Redshift, and DuckDB.
- Online Store
- The data store that is used for low-latency feature retrieval. The online store is used for real-time inference. Examples of online stores include Redis, GCP Datastore, and DynamoDB.
- Server
A feature server that serves pre-computed features online. There are three Feature Store servers:
- The online feature server - A Python feature server that is an HTTP endpoint that serves features with JSON I/O. You can write and read features from the online store using any programming language that can make HTTP requests.
- The offline feature server - An Apache Arrow Flight Server that uses the gRPC communication protocol to exchange data. This server wraps calls to existing offline store implementations and exposes interfaces as Arrow Flight endpoints.
- The registry server - A server that uses the gRPC communication protocol to exchange data. You can communicate with the server using any programming language that can make gRPC requests.
- UI
- A web-based graphical user interface (UI) for viewing all the Feature Store objects and their relationships with each other.
Feature Store provides the following software capabilities:
- A Python SDK for programmatically defining features and data sources
- A Python SDK for reading and writing features to offline and online data stores
- An optional feature server for reading and writing features (useful for non-python languages) by using APIs
- A web-based UI for viewing and exploring information about features defined in the project
- A command line interface (CLI) for viewing and updating feature information
1.4. Feature Store workflow Copy linkLink copied to clipboard!
The Feature Store workflow involves the following tasks OpenShift cluster administrators, and machine learning (ML) engineers or data scientists:
Note: This Feature Store workflow describes a local implementation that is available in this Technology Preview release.
Cluster administrator
Installs and configures Feature Store, as described in Chapter 2. Configuring Feature Store:
- Installs OpenShift AI.
- Enables the Feature Store component by using the Feature Store operator.
- Creates a project.
-
In the project, creates a Feature Store instance by using a
feast.yamlfile that specifies the offline and online stores. - Sets up Feature Store so that ML Engineers and data scientists can push and retrieve features to use for model training and inference.
ML Engineer or data scientist
Prepares features, as described in Chapter 3: Defining features:
- Creates a feature definition file.
- Defines the data sources and other Feature Store objects.
- Makes features available for real-time inference.
Prepares features for model training and real-time inference, as described in Chapter 4. Retrieving features for model training:
- Makes features available to models.
-
Uses
feastPython APIs to retrieve features for model training and inference.
1.5. Setting up the Feature Store user interface for initial use Copy linkLink copied to clipboard!
You can use a web-based user interface to simplify and accelerate the creation of model development features. This visual interface helps you explore and understand your Feature Store.
You can use the Feature Store UI to access a centralized catalog of features and metadata, such as transformation logic and materialization job status. You can also view features, manage entities, and use lineage and search capabilities.
You must enable the Feature Store UI before you can use it.
Prerequisites
- You have Administrator access.
- You have enabled the Feast Operator.
- You have created a Feature Store CRD, as described in Creating a Feature Store instance in a project.
- Your REST API server is running.
Procedure
- Log in to the OpenShift AI Dashboard and click the Feature Store tab on the left navigation.
- On the Overview page, click the create FeatureStore button.
- Add YAML definitions to enable the user interface.
Edit the following label to enable the UI:
-
Label: -
feature-store-ui: enabled - This creates a pod and initiates the service registry.
-
- Click the options icon (⋮) and choose Start job.
- Click on the Jobs tab and then Logs on the left navigation, to confirm that the CronJob is running.
- Navigate to the Feature views tab on the left navigation, and you will be able to see your new UI.
Verification
Click Overview. If your Feature Store user interface was created, you see the following cards: * Entities * Data sources * Datasets * Features * Feature views * Feature services
Additional resources
1.6. Additional resources Copy linkLink copied to clipboard!
- For example Feature Store CRD configurations, see the Feast Operator configuration samples.
- For details about the Feast CRD APIs, see the Feast API documentation.
- For information on how to implement machine learning features, see the Feast documentation.
- For end-to-end use case examples of how Feature Store can benefit your AI/ML workflows, see Feast Getting Started: Use Cases.