Red Hat OpenShift Data Foundation architecture

Red Hat OpenShift Data Foundation 4.19

Overview of OpenShift Data Foundation architecture and the roles that the components and services perform.

Red Hat Storage Documentation Team

Abstract

This document provides an overview of the OpenShift Data Foundation architecture.

Preface
Copy link

This document provides an overview of the OpenShift Data Foundation architecture.

Providing feedback on Red Hat documentation
Copy link

We appreciate your input on our documentation. Do let us know how we can make it better.

To give feedback, create a Jira ticket:

Log in to the Jira.
Click Create in the top navigation bar
Enter a descriptive title in the Summary field.
Enter your suggestion for improvement in the Description field. Include links to the relevant parts of the documentation.
Select Documentation in the Components field.
Click Create at the bottom of the dialogue.

Chapter 1. Introduction to OpenShift Data Foundation
Copy link

Red Hat OpenShift Data Foundation is a highly integrated collection of cloud storage and data services for Red Hat OpenShift Container Platform. It is available as part of the Red Hat OpenShift Container Platform Service Catalog, packaged as an operator to facilitate simple deployment and management.

Red Hat OpenShift Data Foundation services are primarily made available to applications by way of storage classes that represent the following components:

Block storage devices, catering primarily to database workloads. Prime examples include Red Hat OpenShift Container Platform logging and monitoring, and PostgreSQL.

Important

Block storage should be used for any worklaod only when it does not require sharing the data across multiple containers.

Shared and distributed file system, catering primarily to software development, messaging, and data aggregation workloads. Examples include Jenkins build sources and artifacts, Wordpress uploaded content, Red Hat OpenShift Container Platform registry, and messaging using JBoss AMQ.
Multicloud object storage, featuring a lightweight S3 API endpoint that can abstract the storage and retrieval of data from multiple cloud object stores.
On premises object storage, featuring a robust S3 API endpoint that scales to tens of petabytes and billions of objects, primarily targeting data intensive applications. Examples include the storage and access of row, columnar, and semi-structured data with applications like Spark, Presto, Red Hat AMQ Streams (Kafka), and even machine learning frameworks like TensorFlow and Pytorch.

Note

Running PostgresSQL workload on CephFS persistent volume is not supported and it is recommended to use RADOS Block Device (RBD) volume. For more information, see the knowledgebase solution ODF Database Workloads Must Not Use CephFS PVs/PVCs.

Red Hat OpenShift Data Foundation version 4.x integrates a collection of software projects, including:

Ceph, providing block storage, a shared and distributed file system, and on-premises object storage
Ceph CSI, to manage provisioning and lifecycle of persistent volumes and claims
NooBaa, providing a Multicloud Object Gateway
OpenShift Data Foundation, Rook-Ceph, and NooBaa operators to initialize and manage OpenShift Data Foundation services.

Chapter 2. An overview of OpenShift Data Foundation architecture
Copy link

Red Hat OpenShift Data Foundation provides services for, and can run internally from Red Hat OpenShift Container Platform.

Figure 2.1. Red Hat OpenShift Data Foundation architecture

Red Hat OpenShift Data Foundation supports deployment into Red Hat OpenShift Container Platform clusters deployed on Installer Provisioned Infrastructure or User Provisioned Infrastructure. For details about these two approaches, see OpenShift Container Platform - Installation process. To know more about interoperability of components for the Red Hat OpenShift Data Foundation and Red Hat OpenShift Container Platform, see the interoperability matrix.

OpenShift Data Foundation uses failure domain configuration to determine how data replicas are distributed across the cluster. A failure domain represents a physical boundary, such as a host, rack, or availability zone within which replicas of the data are spread to ensure high availability. OpenShift Data Foundation automatically identifies these failure domains based on the labels assigned to the storage nodes. For more information, see the knowledgebase article, OpenShift Data Foundation internal topology aware failure domain configuration.

In internal mode, OpenShift Data Foundation enables read‑affinity by default to improve performance. This feature directs read operations to the nearest available OSD, reducing latency and minimizing cross-failure domain traffic. This behavior is especially valuable in public cloud environments, where it helps reduce expensive data transfers between Availability Zones. In stretched cluster deployments, read‑affinity also prioritizes the local datacenter to ensure efficient, localized data access.

For information about the architecture and lifecycle of OpenShift Container Platform, see OpenShift Container Platform architecture.

Chapter 3. OpenShift Data Foundation operators
Copy link

Red Hat OpenShift Data Foundation is comprised of the following three Operator Lifecycle Manager (OLM) operator bundles, deploying four operators which codify administrative tasks and custom resources so that task and resource characteristics can be easily automated:

OpenShift Data Foundation
- odf-operator
OpenShift Container Storage
- ocs-operator
- rook-ceph-operator
Multicloud Object Gateway
- mcg-operator

Administrators define the desired end state of the cluster, and the OpenShift Data Foundation operators ensure the cluster is either in that state or approaching that state, with minimal administrator intervention.

3.1. OpenShift Data Foundation operator
Copy link

The odf-operator can be described as a "meta" operator for OpenShift Data Foundation, that is, an operator meant to influence other operators.

The odf-operator has the following primary functions:

Enforces the configuration and versioning of the other operators that comprise OpenShift Data Foundation. It does this by using two primary mechanisms: operator dependencies and Subscription management.
- The odf-operator bundle specifies dependencies on other OLM operators to make sure they are always installed at specific versions.
- The operator itself manages the Subscriptions for all other operators to make sure the desired versions of those operators are available for installation by the OLM.
Provides the OpenShift Data Foundation external plugin for the OpenShift Console.
Provides an API to integrate storage solutions with the OpenShift Console.

3.1.1. Components
Copy link

The odf-operator has a dependency on the ocs-operator package. It also manages the Subscription of the mcg-operator. In addition, the odf-operator bundle defines a second Deployment for the OpenShift Data Foundation external plugin for the OpenShift Console. This defines an nginx-based Pod that serves the necessary files to register and integrate OpenShift Data Foundation dashboards directly into the OpenShift Container Platform Console.

3.1.2. Design diagram
Copy link

This diagram illustrates how odf-operator is integrated with the OpenShift Container Platform.

Figure 3.1. OpenShift Data Foundation Operator

3.1.3. Resources
Copy link

The odf-operator creates the Operator Lifecycle Manager Resources CR, which creates a Subscription for the operator.

3.1.4. Limitation
Copy link

The odf-operator does not provide any data storage or services itself. It exists as an integration and management layer for other storage systems.

3.1.5. High availability
Copy link

High availability is not a primary requirement for the odf-operator Pod similar to most of the other operators. In general, there are no operations that require or benefit from process distribution. OpenShift Container Platform quickly spins up a replacement Pod whenever the current Pod becomes unavailable or is deleted.

3.1.6. Relevant config files
Copy link

The odf-operator comes with a ConfigMap of variables that can be used to modify the behavior of the operator.

3.1.7. Relevant log files
Copy link

To get an understanding of the OpenShift Data Foundation and troubleshoot issues, you can look at the following:

Operator Pod logs
StorageCluster status

Operator Pod logs

Each operator provides standard Pod logs that include information about reconciliation and errors encountered. These logs often have information about successful reconciliation which can be filtered out and ignored.

StorageCluster status and events

The storageCluster CR stores the reconciliation details in the status of the CR. The spec of the storage cluster contains the name, namespace, and Kind of the actual storage cluster CRD, which the administrator can use to find further information on the status of the storage cluster.

3.1.8. Lifecycle
Copy link

The odf-operator is required to be present as long as the OpenShift Data Foundation bundle remains installed. This is managed as part of OLM’s reconciliation of the OpenShift Data Foundation CSV. At least one instance of the pod should be in Ready state.

The operator operands such as CRDs should not affect the lifecycle of the operator. The creation and deletion of StorageClusters is an operation outside the operator’s control and must be initiated by the administrator or automated with the appropriate application programming interface (API) calls.

3.2. OpenShift Container Storage operator
Copy link

The ocs-operator can be described as a "meta" operator for OpenShift Data Foundation, that is, an operator meant to influence other operators and serves as a configuration gateway for the features provided by the other operators. It does not directly manage the other operators.

The ocs-operator has the following primary functions:

Creates Custom Resources (CRs) that trigger the other operators to reconcile against them.
Abstracts the Ceph and Multicloud Object Gateway configurations and limits them to known best practices that are validated and supported by Red Hat.
Creates and reconciles the resources required to deploy containerized Ceph and NooBaa according to the support policies.

3.2.1. Components
Copy link

The ocs-operator does not have any dependent components. However, the operator has a dependency on the existence of all the custom resource definitions (CRDs) from other operators, which are defined in the ClusterServiceVersion (CSV).

3.2.2. Design diagram
Copy link

This diagram illustrates how OpenShift Container Storage is integrated with the OpenShift Container Platform.

Figure 3.2. OpenShift Container Storage Operator

3.2.3. Responsibilities
Copy link

The two ocs-operator CRDs are:

OCSInitialization
StorageCluster

OCSInitialization is a singleton CRD used for encapsulating operations that apply at the operator level. The operator takes care of ensuring that one instance always exists. The CR triggers the following:

Performs initialization tasks required for OpenShift Container Storage. If needed, these tasks can be triggered to run again by deleting the OCSInitialization CRD.
- Ensures that the required Security Context Constraints (SCCs) for OpenShift Container Storage are present.
Manages the deployment of the Ceph toolbox Pod, used for performing advanced troubleshooting and recovery operations.

The StorageCluster CRD represents the system that provides the full functionality of OpenShift Container Storage. It triggers the operator to ensure the generation and reconciliation of Rook-Ceph and NooBaa CRDs. The ocs-operator algorithmically generates the CephCluster and NooBaa CRDs based on the configuration in the StorageCluster spec. The operator also creates additional CRs, such as CephBlockPools, Routes, and so on. These resources are required for enabling different features of OpenShift Container Storage. Currently, only one StorageCluster CR per OpenShift Container Platform cluster is supported.

3.2.4. Resources
Copy link

The ocs-operator creates the following CRs in response to the spec of the CRDs it defines . The configuration of some of these resources can be overridden, allowing for changes to the generated spec or not creating them altogether.

General resources

Events: Creates various events when required in response to reconciliation.
Persistent Volumes (PVs): PVs are not created directly by the operator. However, the operator keeps track of all the PVs created by the Ceph CSI drivers and ensures that the PVs have appropriate annotations for the supported features.
Quickstarts: Deploys various Quickstart CRs for the OpenShift Container Platform Console.

Rook-Ceph resources

CephBlockPool: Define the default Ceph block pools. CephFilesysPrometheusRulesoute for the Ceph object store.
StorageClass: Define the default Storage classes. For example, for CephBlockPool and CephFilesystem).
VolumeSnapshotClass: Define the default volume snapshot classes for the corresponding storage classes.

Multicloud Object Gateway resources

NooBaa: Define the default Multicloud Object Gateway system.

Monitoring resources

Metrics Exporter Service
Metrics Exporter Service Monitor
PrometheusRules

3.2.5. Limitation
Copy link

The ocs-operator neither deploys nor reconciles the other Pods of OpenShift Data Foundation. The ocs-operator CSV defines the top-level components such as operator Deployments and the Operator Lifecycle Manager (OLM) reconciles the specified component.

3.2.6. High availability
Copy link

High availability is not a primary requirement for the ocs-operator Pod similar to most of the other operators. In general, there are no operations that require or benefit from process distribution. OpenShift Container Platform quickly spins up a replacement Pod whenever the current Pod becomes unavailable or is deleted.

3.2.7. Relevant config files
Copy link

The ocs-operator configuration is entirely specified by the CSV and is not modifiable without a custom build of the CSV.

3.2.8. Relevant log files
Copy link

To get an understanding of the OpenShift Container Storage and troubleshoot issues, you can look at the following:

Operator Pod logs
StorageCluster status and events
OCSInitialization status

Operator Pod logs

StorageCluster status and events

The StorageCluster CR stores the reconciliation details in the status of the CR and has associated events. Status contains a section of the expected container images. It shows the container images that it expects to be present in the pods from other operators and the images that it currently detects. This helps to determine whether the OpenShift Container Storage upgrade is complete.

OCSInitialization status

This status shows whether the initialization tasks are completed successfully.

3.2.9. Lifecycle
Copy link

The ocs-operator is required to be present as long as the OpenShift Container Storage bundle remains installed. This is managed as part of OLM’s reconciliation of the OpenShift Container Storage CSV. At least one instance of the pod should be in Ready state.

The operator operands such as CRDs should not affect the lifecycle of the operator. An OCSInitialization CR should always exist. The operator creates one if it does not exist. The creation and deletion of StorageClusters is an operation outside the operator’s control and must be initiated by the administrator or automated with the appropriate API calls.

3.3. Rook-Ceph operator
Copy link

Rook-Ceph operator is the Rook operator for Ceph in the OpenShift Data Foundation. Rook enables Ceph storage systems to run on the OpenShift Container Platform.

The Rook-Ceph operator is a simple container that automatically bootstraps the storage clusters and monitors the storage daemons to ensure the storage clusters are healthy.

3.3.1. Components
Copy link

The Rook-Ceph operator manages a number of components as part of the OpenShift Data Foundation deployment.

Ceph daemons

Mons: The monitors (mons) provide the core metadata store for Ceph.
OSDs: The object storage daemons (OSDs) store the data on underlying devices.
Mgr: The manager (mgr) collects metrics and provides other internal functions for Ceph.
RGW: The RADOS Gateway (RGW) provides the S3 endpoint to the object store.
MDS: The metadata server (MDS) provides CephFS shared volumes.

3.3.2. Design diagram
Copy link

The following image illustrates how Ceph Rook integrates with OpenShift Container Platform.

Figure 3.3. Rook-Ceph Operator

With Ceph running in the OpenShift Container Platform cluster, OpenShift Container Platform applications can mount block devices and filesystems managed by Rook-Ceph, or can use the S3/Swift API for object storage.

3.3.3. Responsibilities
Copy link

The Rook-Ceph operator is a container that bootstraps and monitors the storage cluster. It performs the following functions:

Automates the configuration of storage components
Starts, monitors, and manages the Ceph monitor pods and Ceph OSD daemons to provide the RADOS storage cluster
Initializes the pods and other artifacts to run the services to manage:
- CRDs for pools
- Object stores (S3/Swift)
- Filesystems
Monitors the Ceph mons and OSDs to ensure that the storage remains available and healthy
Deploys and manages Ceph mons placement while adjusting the mon configuration based on cluster size
Watches the desired state changes requested by the API service and applies the changes
Initializes the Ceph-CSI drivers that are needed for consuming the storage
Automatically configures the Ceph-CSI driver to mount the storage to pods

Rook-Ceph Operator architecture

Rook-Ceph Operator architecture

The Rook-Ceph operator image includes all required tools to manage the cluster. There is no change to the data path. However, the operator does not expose all Ceph configurations. Many of the Ceph features like placement groups and crush maps are hidden from the users and are provided with a better user experience in terms of physical resources, pools, volumes, filesystems, and buckets.

3.3.4. Resources
Copy link

Rook-Ceph operator adds owner references to all the resources it creates in the openshift-storage namespace. When the cluster is uninstalled, the owner references ensure that the resources are all cleaned up. This includes OpenShift Container Platform resources such as configmaps, secrets, services, deployments, daemonsets, and so on.

The Rook-Ceph operator watches CRs to configure the settings determined by OpenShift Data Foundation, which includes CephCluster, CephObjectStore, CephFilesystem, and CephBlockPool.

3.3.5. Lifecycle
Copy link

Rook-Ceph operator manages the lifecycle of the following pods in the Ceph cluster:

Rook operator

A single pod that owns the reconcile of the cluster.

RBD CSI Driver

Two provisioner pods, managed by a single deployment.
One plugin pod per node, managed by a daemonset.

CephFS CSI Driver

Two provisioner pods, managed by a single deployment.
One plugin pod per node, managed by a daemonset.

Monitors (mons)

Three mon pods, each with its own deployment.

Stretch clusters: Contain five mon pods, one in the arbiter zone and two in each of the other two data zones.

Manager (mgr)

There is a single mgr pod for the cluster.

Stretch clusters: There are two mgr pods (starting with OpenShift Data Foundation 4.8), one in each of the two non-arbiter zones.

Object storage daemons (OSDs)

At least three OSDs are created initially in the cluster. More OSDs are added when the cluster is expanded.

Metadata server (MDS)

The CephFS metadata server has a single pod.

RADOS gateway (RGW)

The Ceph RGW daemon has a single pod.

3.4. MCG operator
Copy link

The Multicloud Object Gateway (MCG) operator is an operator for OpenShift Data Foundation along with the OpenShift Data Foundation operator and the Rook-Ceph operator. The MCG operator is available upstream as a standalone operator.

The MCG operator performs the following primary functions:

Controls and reconciles the Multicloud Object Gateway (MCG) component within OpenShift Data Foundation.
Manages new user resources such as object bucket claims, bucket classes, and backing stores.
Creates the default out-of-the-box resources.

A few configurations and information are passed to the MCG operator through the OpenShift Data Foundation operator.

3.4.1. Components
Copy link

The MCG operator does not have sub-components. However, it consists of a reconcile loop for the different resources that are controlled by it.

The MCG operator has a command-line interface (CLI) and is available as a part of OpenShift Data Foundation. It enables the creation, deletion, and querying of various resources. This CLI adds a layer of input sanitation and status validation before the configurations are applied unlike applying a YAML file directly.

3.4.2. Responsibilities and resources
Copy link

The MCG operator reconciles and is responsible for the custom resource definitions (CRDs) and OpenShift Container Platform entities.

Backing store
Namespace store
Bucket class
Object bucket claims (OBCs)
NooBaa, pod stateful sets CRD
Prometheus Rules and Service Monitoring
Horizontal pod autoscaler (HPA)

Backing store

A resource that the customer has connected to the MCG component. This resource provides MCG the ability to save the data of the provisioned buckets on top of it.

A default backing store is created as part of the deployment depending on the platform that the OpenShift Container Platform is running on. For example, when OpenShift Container Platform or OpenShift Data Foundation is deployed on Amazon Web Services (AWS), it results in a default backing store which is an AWS::S3 bucket. Similarly, for Microsoft Azure, the default backing store is a blob container and so on.

The default backing stores are created using CRDs for the cloud credential operator, which comes with OpenShift Container Platform. There is no limit on the amount of the backing stores that can be added to MCG. The backing stores are used in the bucket class CRD to define the different policies of the bucket. Refer the documentation of the specific OpenShift Data Foundation version to identify the types of services or resources supported as backing stores.

Namespace store

Resources that are used in namespace buckets. No default is created during deployment.

Bucketclass

A default or initial policy for a newly provisioned bucket. The following policies are set in a bucketclass:

Placement policy

Indicates the backing stores to be attached to the bucket and used to write the data of the bucket. This policy is used for data buckets and for cache policies to indicate the local cache placement. There are two modes of placement policy:

Spread. Strips the data across the defined backing stores
Mirror. Creates a full replica on each backing store

Namespace policy

A policy for the namespace buckets that defines the resources that are being used for aggregation and the resource used for the write target.

Cache Policy

This is a policy for the bucket and sets the hub (the source of truth) and the time to live (TTL) for the cache items.

A default bucket class is created during deployment and it is set with a placement policy that uses the default backing store. There is no limit to the number of bucket class that can be added.

Refer to the documentation of the specific OpenShift Data Foundation version to identify the types of policies that are supported.

Object bucket claims (OBCs)

CRDs that enable provisioning of S3 buckets. With MCG, OBCs receive an optional bucket class to note the initial configuration of the bucket. If a bucket class is not provided, the default bucket class is used.

NooBaa, pod stateful sets CRD

An internal CRD that controls the different pods of the NooBaa deployment such as the DB pod, the core pod, and the endpoints. This CRD must not be changed as it is internal. This operator reconciles the following entities:

DB pod SCC
Role Binding and Service Account to allow SSO single sign-on between OpenShift Container Platform and NooBaa user interfaces
Route for S3 access
Certificates that are taken and signed by the OpenShift Container Platform and are set on the S3 route

Prometheus rules and service monitoring

These CRDs set up scraping points for Prometheus and alert rules that are supported by MCG.

Horizontal pod autoscaler (HPA)

It is Integrated with the MCG endpoints. The endpoint pods scale up and down according to CPU pressure (amount of S3 traffic).

3.4.3. High availability
Copy link

As an operator, the only high availability provided is that the OpenShift Container Platform reschedules a failed pod.

3.4.4. Relevant log files
Copy link

To troubleshoot issues with the NooBaa operator, you can look at the following:

Operator pod logs, which are also available through the must-gather.
Different CRDs or entities and their statuses that are available through the must-gather.

3.4.5. Lifecycle
Copy link

The MCG operator runs and reconciles after OpenShift Data Foundation is deployed and until it is uninstalled.

Chapter 4. OpenShift Data Foundation installation overview
Copy link

OpenShift Data Foundation consists of multiple components managed by multiple operators.

4.1. Installed Operators
Copy link

When you install OpenShift Data Foundation from the Operator Hub, the following four separate Deployments are created:

odf-operator: Defines the odf-operator Pod
ocs-operator: Defines the ocs-operator Pod which runs processes for ocs-operator and its metrics-exporter in the same container.
rook-ceph-operator: Defines the rook-ceph-operator Pod.
mcg-operator: Defines the mcg-operator Pod.

These operators run independently and interact with each other by creating customer resources (CRs) watched by the other operators. The ocs-operator is primarily responsible for creating the CRs to configure Ceph storage and Multicloud Object Gateway. The mcg-operator sometimes creates Ceph volumes for use by its components.

4.2. OpenShift Container Storage initialization
Copy link

The OpenShift Data Foundation bundle also defines an external plugin to the OpenShift Container Platform Console, adding new screens and functionality not otherwise available in the Console. This plugin runs as a web server in the odf-console-plugin Pod, which is managed by a Deployment created by the OLM at the time of installation.

The ocs-operator automatically creates an OCSInitialization CR after it gets created. Only one OCSInitialization CR exists at any point in time. It controls the ocs-operator behaviors that are not restricted to the scope of a single StorageCluster, but only performs them once. When you delete the OCSInitialization CR, the ocs-operator creates it again and this allows you to re-trigger its initialization operations.

The OCSInitialization CR controls the following behaviors:

SecurityContextConstraints (SCCs): After the OCSInitialization CR is created, the ocs-operator creates various SCCs for use by the component Pods.
Ceph Toolbox Deployment: You can use the OCSInitialization to deploy the Ceph Toolbox Pod for the advanced Ceph operations.
Rook-Ceph Operator Configuration: This configuration creates the rook-ceph-operator-config ConfigMap that governs the overall configuration for rook-ceph-operator behavior.

4.3. Storage cluster creation
Copy link

The OpenShift Data Foundation operators themselves provide no storage functionality, and the desired storage configuration must be defined.

After you install the operators, create a new StorageCluster, using either the OpenShift Container Platform console wizard or the CLI and the ocs-operator reconciles this StorageCluster. OpenShift Data Foundation supports a single StorageCluster per installation. Any StorageCluster CRs created after the first one are ignored by ocs-operator reconciliation.

OpenShift Data Foundation allows the following StorageCluster configurations:

Internal: In the Internal mode, all the components run containerized within the OpenShift Container Platform cluster and use dynamically provisioned persistent volumes (PVs) created against the StorageClass specified by the administrator in the installation wizard.
Internal-attached: This mode is similar to the Internal mode but the administrator is required to define the local storage devices directly attached to the cluster nodes that the Ceph uses for its backing storage. Also, the administrator needs to create the CRs that the local storage operator reconciles to provide the StorageClass. The ocs-operator uses this StorageClass as the backing storage for Ceph.
External: In this mode, Ceph components do not run inside the OpenShift Container Platform cluster; instead connectivity is provided to an external OpenShift Container Storage installation for which the applications can create PVs. The other components run within the cluster as required.
MCG Standalone: This mode facilitates the installation of a Multicloud Object Gateway system without an accompanying CephCluster.

After a StorageCluster CR is found, ocs-operator validates it and begins to create subsequent resources to define the storage components.

4.3.1. Internal mode storage cluster
Copy link

Both internal and internal-attached storage clusters have the same setup process as follows:

`StorageClasses`	Create the storage classes that cluster applications use to create Ceph volumes.
`SnapshotClasses`	Create the volume snapshot classes that the cluster applications use to create snapshots of Ceph volumes.
Ceph RGW configuration	Create various Ceph object CRs to enable and provide access to the Ceph RGW object storage endpoint.
Ceph RBD Configuration	Create the `CephBlockPool` CR to enable RBD storage.
CephFS Configuration	Create the `CephFilesystem` CR to enable CephFS storage.
Rook-Ceph Configuration	Create the `rook-config-override` `ConfigMap` that governs the overall behavior of the underlying Ceph cluster.
`CephCluster`	Create the `CephCluster` CR to trigger Ceph reconciliation from `rook-ceph-operator`. For more information, see Rook-Ceph operator.
`NoobaaSystem`	Create the `NooBaa` CR to trigger reconciliation from `mcg-operator`. For more information, see MCG operator.
Job templates	Create `OpenShift Template` CRs that define Jobs to run administrative operations for OpenShift Container Storage.
Quickstarts	Create the `QuickStart` CRs that display the quickstart guides in the Web Console.

Note

The RGW (RADOS Gateway) component is deployed only in on-premises environments. It is not created for cloud-based deployments.

4.3.1.1. Cluster Creation
Copy link

After the ocs-operator creates the CephCluster CR, the rook-operator creates the Ceph cluster according to the desired configuration. The rook-operator configures the following components:

Ceph `mon` daemons	Three Ceph `mon` daemons are started on different nodes in the cluster. They manage the core metadata for the Ceph cluster and they must form a majority quorum. The metadata for each `mon` is backed either by a PV if it is in a cloud environment or a path on the local host if it is in a local storage device environment.
Ceph `mgr` daemon	This daemon is started and it gathers metrics for the cluster and report them to Prometheus.
Ceph OSDs	These OSDs are created according to the configuration of the `storageClassDeviceSets`. Each OSD consumes a PV that stores the user data. By default, Ceph maintains three replicas of the application data across different OSDs for high durability and availability using the `CRUSH` algorithm.
CSI provisioners	These provisioners are started for RBD and `CephFS`. When volumes are requested for the storage classes of OpenShift Container Storage, the requests are directed to the `Ceph-CSI` driver to provision the volumes in Ceph.
CSI volume plugins and `CephFS`	The CSI volume plugins for RBD and `CephFS` are started on each node in the cluster. The volume plugin needs to be running wherever the Ceph volumes are required to be mounted by the applications.

After the CephCluster CR is configured, Rook reconciles the remaining Ceph CRs to complete the setup:

`CephBlockPool`	The `CephBlockPool` CR provides the configuration for Rook operator to create Ceph pools for RWO volumes.
`CephFilesystem`	The `CephFilesystem` CR instructs the Rook operator to configure a shared file system with CephFS, typically for RWX volumes. The CephFS metadata server (MDS) is started to manage the shared volumes.
`CephObjectStore`	The `CephObjectStore` CR instructs the Rook operator to configure an object store with the RGW service
`CephObjectStoreUser` CR	The `CephObjectStoreUser` CR instructs the Rook operator to configure an object store user for NooBaa to consume, publishing `access/private` key as well as the `CephObjectStore` endpoint.

The operator monitors the Ceph health to ensure that storage platform remains healthy. If a mon daemon goes down for too long a period (10 minutes), Rook starts a new mon in its place so that the full quorum can be fully restored.

When the ocs-operator updates the CephCluster CR, Rook immediately responds to the requested changes to update the cluster configuration.

4.3.1.2. NooBaa System creation
Copy link

When a NooBaa system is created, the mcg-operator reconciles the following:

Default BackingStore

Depending on the platform that OpenShift Container Platform and OpenShift Data Foundation are deployed on, a default backing store resource is created so that buckets can use it for their placement policy. The different options are as follows:

Amazon Web Services (AWS) deployment	The `mcg-operator` uses the `CloudCredentialsOperator` (CCO) to mint credentials in order to create a new AWS::S3 bucket and creates a `BackingStore` on top of that bucket.
Microsoft Azure deployment	The `mcg-operator` uses the CCO to mint credentials in order to create a new Azure Blob and creates a `BackingStore` on top of that bucket.
Google Cloud Platform (GCP) deployment	The `mcg-operator` uses the CCO to mint credentials in order to create a new GCP bucket and will create a BackingStore on top of that bucket.
On-prem deployment	If RGW exists, the `mcg-operator` creates a new `CephUser` and a new bucket on top of RGW and creates a `BackingStore` on top of that bucket.
None of the previously mentioned deployments are applicable	The `mcg-operator` creates a `pv-pool` based on the default storage class and creates a `BackingStore` on top of that bucket.

Default BucketClass

A BucketClass with a placement policy to the default BackingStore is created.

NooBaa pods

The following NooBaa pods are created and started:

Database (DB)	This is a Postgres DB holding metadata, statistics, events, and so on. However, it does not hold the actual data being stored.
Core	This is the pod that handles configuration, background processes, metadata management, statistics, and so on.
Endpoints	These pods perform the actual I/O-related work such as deduplication and compression, communicating with different services to write and read data, and so on. The endpoints are integrated with the `HorizonalPodAutoscaler` and their number increases and decreases according to the CPU usage observed on the existing endpoint pods.

Route

A Route for the NooBaa S3 interface is created for applications that use S3.

Service

A Service for the NooBaa S3 interface is created for applications that use S3.

4.3.2. External mode storage cluster
Copy link

For external storage clusters, ocs-operator follows a slightly different setup process. The ocs-operator looks for the existence of the rook-ceph-external-cluster-details ConfigMap, which must be created by someone else, either the administrator or the Console. For information about how to create the ConfigMap, see Creating an OpenShift Data Foundation Cluster for external mode. The ocs-operator then creates some or all of the following resources, as specified in the ConfigMap:

External Ceph Configuration	A ConfigMap that specifies the endpoints of the external `mons`.
External Ceph Credentials Secret	A Secret that contains the credentials to connect to the external Ceph instance.
External Ceph StorageClasses	One or more StorageClasses to enable the creation of volumes for RBD, CephFS, and/or RGW.
Enable CephFS CSI Driver	If a `CephFS` StorageClass is specified, configure `rook-ceph-operator` to deploy the `CephFS` CSI Pods.
Ceph RGW Configuration	If an RGW StorageClass is specified, create various Ceph Object CRs to enable and provide access to the Ceph RGW object storage endpoint.

After creating the resources specified in the ConfigMap, the StorageCluster creation process proceeds as follows:

`CephCluster`	Create the `CephCluster` CR to trigger Ceph reconciliation from `rook-ceph-operator` (see subsequent sections).
`SnapshotClasses`	Create the `SnapshotClasses` that applications use to create snapshots of Ceph volumes.
`NoobaaSystem`	Create the `NooBaa` CR to trigger reconciliation from the noobaa-operator (see subsequent sections).
`QuickStarts`	Create the `Quickstart` CRs that display the quickstart guides in the Console.

4.3.2.1. Cluster Creation
Copy link

The Rook operator performs the following operations when the CephCluster CR is created in external mode:

The operator validates that a connection is available to the remote Ceph cluster. The connection requires mon endpoints and secrets to be imported into the local cluster.
The CSI driver is configured with the remote connection to Ceph. The RBD and CephFS provisioners and volume plugins are started similarly to the CSI driver when configured in internal mode, the connection to Ceph happens to be external to the OpenShift cluster.
Periodically watch for monitor address changes and update the Ceph-CSI configuration accordingly.

4.3.2.2. NooBaa System creation
Copy link

When a NooBaa system is created, the mcg-operator reconciles the following:

Default BackingStore

Amazon Web Services (AWS) deployment	The `mcg-operator` uses the `CloudCredentialsOperator` (CCO) to mint credentials in order to create a new AWS::S3 bucket and creates a `BackingStore` on top of that bucket.
Microsoft Azure deployment	The `mcg-operator` uses the CCO to mint credentials in order to create a new Azure Blob and creates a `BackingStore` on top of that bucket.
Google Cloud Platform (GCP) deployment	The `mcg-operator` uses the CCO to mint credentials in order to create a new GCP bucket and will create a BackingStore on top of that bucket.
On-prem deployment	If RGW exists, the `mcg-operator` creates a new `CephUser` and a new bucket on top of RGW and creates a `BackingStore` on top of that bucket.
None of the previously mentioned deployments are applicable	The `mcg-operator` creates a `pv-pool` based on the default storage class and creates a `BackingStore` on top of that bucket.

Default BucketClass

A BucketClass with a placement policy to the default BackingStore is created.

NooBaa pods

The following NooBaa pods are created and started:

Database (DB)	This is a Postgres DB holding metadata, statistics, events, and so on. However, it does not hold the actual data being stored.
Core	This is the pod that handles configuration, background processes, metadata management, statistics, and so on.
Endpoints	These pods perform the actual I/O-related work such as deduplication and compression, communicating with different services to write and read data, and so on. The endpoints are integrated with the `HorizonalPodAutoscaler` and their number increases and decreases according to the CPU usage observed on the existing endpoint pods.

Route

A Route for the NooBaa S3 interface is created for applications that use S3.

Service

A Service for the NooBaa S3 interface is created for applications that use S3.

4.3.3. MCG Standalone StorageCluster
Copy link

In this mode, no CephCluster is created. Instead a NooBaa system CR is created using default values to take advantage of pre-existing StorageClasses in the OpenShift Container Platform. dashboards.

4.3.3.1. NooBaa System creation
Copy link

When a NooBaa system is created, the mcg-operator reconciles the following:

Default BackingStore

Amazon Web Services (AWS) deployment	The `mcg-operator` uses the `CloudCredentialsOperator` (CCO) to mint credentials in order to create a new AWS::S3 bucket and creates a `BackingStore` on top of that bucket.
Microsoft Azure deployment	The `mcg-operator` uses the CCO to mint credentials in order to create a new Azure Blob and creates a `BackingStore` on top of that bucket.
Google Cloud Platform (GCP) deployment	The `mcg-operator` uses the CCO to mint credentials in order to create a new GCP bucket and will create a BackingStore on top of that bucket.
On-prem deployment	If RGW exists, the `mcg-operator` creates a new `CephUser` and a new bucket on top of RGW and creates a `BackingStore` on top of that bucket.
None of the previously mentioned deployments are applicable	The `mcg-operator` creates a `pv-pool` based on the default storage class and creates a `BackingStore` on top of that bucket.

Default BucketClass

A BucketClass with a placement policy to the default BackingStore is created.

NooBaa pods

The following NooBaa pods are created and started:

Database (DB)	This is a Postgres DB holding metadata, statistics, events, and so on. However, it does not hold the actual data being stored.
Core	This is the pod that handles configuration, background processes, metadata management, statistics, and so on.
Endpoints	These pods perform the actual I/O-related work such as deduplication and compression, communicating with different services to write and read data, and so on. The endpoints are integrated with the `HorizonalPodAutoscaler` and their number increases and decreases according to the CPU usage observed on the existing endpoint pods.

Route

A Route for the NooBaa S3 interface is created for applications that use S3.

Service

A Service for the NooBaa S3 interface is created for applications that use S3.

Chapter 5. OpenShift Data Foundation upgrade overview
Copy link

As an operator bundle managed by the Operator Lifecycle Manager (OLM), OpenShift Data Foundation leverages its operators to perform high-level tasks of installing and upgrading the product through ClusterServiceVersion (CSV) CRs.

5.1. Upgrade Workflows
Copy link

OpenShift Data Foundation recognizes two types of upgrades: Z-stream release upgrades and Minor Version release upgrades. While the user interface workflows for these two upgrade paths are not quite the same, the resulting behaviors are fairly similar. The distinctions are as follows:

For Z-stream releases, OCS will publish a new bundle in the redhat-operators CatalogSource. The OLM will detect this and create an InstallPlan for the new CSV to replace the existing CSV. The Subscription approval strategy, whether Automatic or Manual, will determine whether the OLM proceeds with reconciliation or waits for administrator approval.

For Minor Version releases, OpenShift Container Storage will also publish a new bundle in the redhat-operators CatalogSource. The difference is that this bundle will be part of a new channel, and channel upgrades are not automatic. The administrator must explicitly select the new release channel. Once this is done, the OLM will detect this and create an InstallPlan for the new CSV to replace the existing CSV. Since the channel switch is a manual operation, OLM will automatically start the reconciliation.

From this point onwards, the upgrade processes are identical.

5.2. ClusterServiceVersion Reconciliation
Copy link

When the OLM detects an approved InstallPlan, it begins the process of reconciling the CSVs. Broadly, it does this by updating the operator resources based on the new spec, verifying the new CSV installs correctly, then deleting the old CSV. The upgrade process will push updates to the operator Deployments, which will trigger the restart of the operator Pods using the images specified in the new CSV.

Note

While it is possible to make changes to a given CSV and have those changes propagate to the relevant resource, when upgrading to a new CSV all custom changes will be lost, as the new CSV will be created based on its unaltered spec.

5.3. Operator Reconciliation
Copy link

At this point, the reconciliation of the OpenShift Data Foundation operands proceeds as defined in the OpenShift Data Foundation installation overview. The operators will ensure that all relevant resources exist in their expected configurations as specified in the user-facing resources (for example, StorageCluster).

Chapter 6. Multus architecture for OpenShift Data Foundation
Copy link

This section describes processes and configurations that are supported when configuring OpenShift Data Foundation to use the OpenShift multi-network plugin (Multus). This architecture provides information about OpenShift Data Foundation requirements, processes, and features that are relevant to Multus.

For configurations that deviate from this reference architecture, open a support exception using the Jira ticket link: https://issues.redhat.com/projects/SUPPORTEX.

6.1. Assumptions
Copy link

This Multus reference architecture section assumes the following:

OpenShift Data Foundation is deployed in internal mode using dedicated storage nodes.
OpenShift control plane nodes do not run applications that require OpenShift Data Foundation storage.
This is not required for supportability, but some node selectors and/or tolerations might need to be modified as needed for other cluster layouts.
VLANs are present. VLANs are optional and neither recommended nor discouraged. However, in many instances VLANs are required depending on the user environment.

6.2. Example scenario used
Copy link

This Multus reference architecture uses the following example:

Four dedicated OpenShift Data Foundation storage nodes
Two OpenShift worker nodes
Three Openshift control plane nodes

The following diagram shows the example cluster network architecture in a traditional OpenShift Data Foundation deployment without Multus:

Figure 6.1. Example cluster network architecture without Multus

6.3. How to choose Multus architecture for an OpenShift Data Foundation deployment
Copy link

Multus networking for an OpenShift Data Foundation deployment is chosen for one or more of the following reasons:

To reduce latency of the storage system by avoiding the OpenShift Pod (software) network
To dedicate bandwidth to the storage system to avoid the noisy neighbor problem between storage and applications
To segregate the storage network from applications for isolation

Multus is enabled for either or both of the following types of OpenShift Data Foundation storage networks:

Public network
Cluster network

However, it is recommended to configure only the Multus public network as a starting point.

6.3.1. Multus public network
Copy link

The following diagram shows the cluster network architecture for an OpenShift Data Foundation deployment using the Multus public network:

Figure 6.2. OpenShift Data Foundation deployment using Multus public network

This architecture achieves all the three benefits of Multus networking:

Storage latency is reduced by avoiding the software-defined pod network
Storage system gets dedicated bandwidth that avoids the noisy-neighbor interference
Storage network is isolated from applications that prevent pods from snooping the storage network

However, if the public network has less throughput than the Pod network, storage traffic might not be faster than a traditional deployment.

Public network is the recommended architecture for the majority of OpenShift Data Foundation cluster deployments. In this architecture, all storage traffic uses the dedicated public network, which includes I/O traffic from applications, data replication traffic within the storage system, and data recovery traffic also within the storage system.

6.3.2. Multus public and cluster networks
Copy link

The following diagram shows the cluster network architecture for an OpenShift Data Foundation deployment using both Multus public network and Multus cluster network:

Figure 6.3. Cluster network architecture using both Multus public network and Multus cluster network

When both public and cluster networks are used simultaneously, all three benefits of Multus are achieved. In this case, each storage network performs a specific role in the OpenShift Data Foundation cluster. The public network handles only the I/O traffic from applications and the cluster network handles storage replication and recovery traffic.

The cluster network handles data replication traffic on a regular basis. Assuming OpenShift Data Foundation’s default 3x replication, the cluster network handles 2x the application I/O traffic for each data write to bring the total number of storage data copies up to 3x. During a storage node or zone failure event, the cluster network also handles recovering data in the cluster.

In this architecture, the size of the cluster network is recommended to be 2-3x larger than the public network. Larger sizing keeps the failover times short, but it also results in a large amount of bandwidth being wasted during normal cluster operations. Overall storage bandwidth during a storage failure event might be limited by either the public or cluster network depending on the size of either network.

This architecture is more complicated than the recommended public-only architecture and has many more factors that could affect user SLOs. Hence, as an alternative, OpenShift Data Foundation suggests bonding all storage network interfaces together to create a single, larger public network instead.

6.3.3. Multus cluster network
Copy link

The following diagram shows the cluster network architecture for an OpenShift Data Foundation deployment using Multus cluster network:

Figure 6.4. Cluster network architecture

In the cluster network architecture, Multus benefits are partially achieved. The cluster network takes the replication and recovery roles. The OpenShift Pod network takes the role of handling storage I/O from applications. This ensures noisy-neighbor interference is minimized, but application I/O can still be affected by other application networking, and vice versa.

Application I/O to the storage system is expected to have higher latency than other Multus architectures due to the software-defined Pod network. Similarly, the storage network is not isolated from applications. For example, a privileged Pod could still snoop storage traffic.

6.4. Multus networking requirements
Copy link

This section covers the following requirements for using Multus networking on an OpenShift Data Foundation deployment.

General requirements
Routing requirements
Address space sizing requirements

6.4.1. General requirements
Copy link

The general requirements for Multus networking are as follows:

Interface used for the storage networks must have the same interface name on each OpenShift storage and worker node.
Interfaces must all be connected to the same underlying network.
When both public and cluster networks are used, they should not be connected to each other.
At least one additional network for storage is required.
Storage networks must be at least 10 Gbps. Bonded links are supported and suggested.

OpenShift Data Foundation with Multus creates many virtual MAC addresses on each storage network host interface. This requires network cards and switch hardware to support promiscuous mode. Some smart switches have security processes that inspect each packet, and these switches can bog down significantly due to the high number of MAC addresses for OpenShift Data Foundation. It is best to disable smart switch security features that require advanced processing.

Similarly, virtual or software switches might struggle to process the large number of MAC addresses. Some are not able to disable this functionality. Therefore, do not use virtual switches without a support exception.

6.4.2. Routing requirements
Copy link

For OpenShift Data Foundation Persistent Volume Claims (PVCs) to work properly, the Ceph-CSI driver pods deployed to each OpenShift node (workers and storage) must use host networking. Ceph-CSI drivers must also be able to reach OpenShift Data Foundation Ceph cluster through the Multus public network. This means that each OpenShift node must have a connection to the public network, and vice versa.

To make this as simple as possible without risk of IP overlap issues, select a different subnet (IP range) for node connections to the public network and Pod connections to the public network. Set up routing on the node public subnet and Pod public subnet such that they route to each other. The final result is that the node public subnet and Pod public subnet will act together as a single contiguous public network.

If the cluster network is used, do not route the Pod public network or node public network to the cluster network. The cluster network should be independent and isolated. The diagram below illustrates subnet routing requirements:

Figure 6.5. Subnet routing requirements

6.4.3. Address space sizing
Copy link

OpenShift Data Foundation with Multus requires an IP address for each OpenShift Data Foundation Ceph storage daemon. Networks must have enough IP addresses to account for the number of storage pods that gets attached to the network, and some additional space to account for failover events. Expanding networks in the future might require downtime, so allocating space for the largest foreseeable cluster size is recommended.

For Multus public network, the following ranges must be planned:

The Pod public network address space must include enough IPs for the total number of ODF pods running in the openshift-storage namespace
The node public network address space must include enough IPs for the total number of OpenShift nodes (worker + storage)

For Multus cluster network, the following range must be planned:

The Pod cluster network address space must include enough IPs for the total number of OSD pods running in the openshift-storage namespace

6.5. Recommendations for Multus networking
Copy link

This section describes the general and address range recommendations for Multus networking on OpenShift Data Foundation deployment.

6.5.1. General recommendations for Multus networking
Copy link

Red Hat Ceph underpins OpenShift Data Foundation. Red Hat Ceph networking recommendations are useful for understanding more about OpenShift Data Foundation Multus networking and for finding specific recommendations. For more information, see link: Network considerations for Red Hat Ceph Storage that also applies to OpenShift Data Foundation networking.

6.5.2. Address ranges
Copy link

The following table recommends IP ranges for each Multus network or subnetwork:

Expand

Table 6.1. Recommended IP ranges
Network	Network range CIDR	Approximate maximum	Public or cluster network
Pod public network	192.168.240.0/21	Total of 1600 OpenShift Data Foundation pods	Public network
Pod cluster network	192.168.248.0/22	800 OSDs	Cluster network
Public node network	192.168.252.0/23	400 OpenShift worker+storage nodes	Public network

This should suffice for most organizations. The recommendation uses the last 6.25% (1/16th) of the reserved private address space (192.168.0.0/16). Approximate maximum ODF and OpenShift sizing is noted, which accounts for 25% overhead to gracefully handle failure events.

6.6. Planning host interface connections
Copy link

OpenShift nodes can be separated into following three types:

OpenShift control plane nodes
OpenShift worker nodes
OpenShift Data Foundation storage nodes.

Each type requires different connections to the storage cluster as follows:

All OpenShift nodes require access to the OpenShift machine network
Control plane nodes do not require any connection to storage networks
- If OpenShift Data Foundation PVCs are needed on control plane nodes (not recommended), treat them as worker nodes
- If control plane nodes are used as OpenShift Data Foundation storage nodes, treat them as storage nodes
Worker nodes run Ceph-CSI driver daemons to mount PVC storage to Pods and therefore require a connection to the OpenShift Data Foundation public network (if used)
Storage nodes require access to all used storage networks

6.6.1. Example interfaces and connections
Copy link

The following diagram shows example interfaces and connections that might be planned for each node type:

Figure 6.6. Example interfaces and connections

This example includes cluster network for completeness.

In this example, vlan220 (public) and vlan221 (cluster) are the interfaces to use for OpenShift Data Foundation configuration. If VLANs are unused in the deployed environment, bond1 and bond2 would be used instead. If no bonds or VLANs are used, host interfaces (for example, eth2, eth4) would be used instead.OpenShift Data Foundation

Important

The “ODF public network” in the diagram accounts for both the OpenShift Data Foundation Pod public network and OpenShift Data Foundation node public network. The OpenShift machine network is used as a backbone for the OpenShift Pod network and is included for reference only. Do not mistake this network for the ODF node public network. Do not plan or execute routing that involves the machine network.

6.6.2. Recommended host interface connection
Copy link

For an OpenShift Data Foundation Multus architecture using only the public network (recommended for most users), the same nodes from the example above can be used. eth0, eth1, eth2, and eth4 would be bonded as bond1 and tagged as vlan220. This architecture ensures that storage nodes have additional available bandwidth for replication and recovery needed by OpenShift Data Foundation Ceph OSDs. vlan220 is be used for OpenShift Data Foundation configuration. However, many variations of valid environments exist.

Figure 6.7. Recommended host interface connection

There is no requirement to dedicate extra public network bandwidth to storage nodes, though this is beneficial. eth4 and eth5 can be omitted from storage nodes entirely to dedicate the same bandwidth to both worker and storage nodes. This design has been commonly seen in customer environments.

Alternatively, eth2 and eth3 can be higher-throughput interfaces on storage nodes (for example, 10Gbps on worker nodes and 25Gbps on storage nodes). This allows for more storage node bandwidth while still using only two interfaces on storage nodes.

6.7. Multus network configuration
Copy link

Multus networks are configured with NetworkAttachmentDefinition objects. These objects configure Pod network connections. OpenShift Network Attachment Definitions support many options, however, OpenShift Data Foundation supports only a few that are outlined as follows:

macvlan is the only supported CNI plugin whereabouts is the only supported IP Address Management (IPAM) plugin

Diverging from these selections requires a support exception

Address range values must also match those selected for the cluster. The example cluster, example host interfaces, and recommended network ranges are used here. Modify them as needed in the deployed environment.

6.7.1. Public NetworkAttachmentDefinition
Copy link

The recommended Pod public network configuration for the example is as follow. Omit this if the Multus public network is not used. Make sure to include the routes section that allows Pods on the public network to reach nodes through the public network.

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: public-net
  namespace: openshift-storage
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "type": "macvlan",
      "master": "vlan220", # host public network interface
      "mode": "bridge",
      "ipam": {
        "type": "whereabouts",
        "range": "192.168.240.0/21", # pod public network range
        "routes": [
          {"dst": "192.168.252.0/23"} # node public network range
        ]
      }
    }

6.7.2. Cluster NetworkAttachmentDefinition
Copy link

The recommended Pod cluster network configuration for the example is as follows. Omit this if the Multus cluster network is not used (recommended for most users). This configuration should not have a routes section.

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: cluster-net
  namespace: openshift-storage
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "type": "macvlan",
      "master": "vlan221", # host cluster network interface
      "mode": "bridge",
      "ipam": {
        "type": "whereabouts",
        "range": "192.168.248.0/22" # pod cluster network range
      }
    }

6.7.3. Node configuration for Multus public network
Copy link

OpenShift worker and storage nodes must be configured to route host traffic to the Pods on the public network through the host public network interface.

The recommended way to configure nodes is using OpenShift NodeNetworkConfigurationPolicy objects. This method can be supported for all OpenShift users regardless of deployment strategy. This method requires the NMState Operator to be installed and enabled. For more information, see link: Kubernetes NMState Operator.

Each node must obtain an IP address on the ODF public network in the node public network address range. Static IP address management is the only IPAM method that can be supported for any OpenShift cluster. Thus, static management is OpenShift Data Foundation supports only the static management method. This requires one NodeNetworkConfigurationPolicy object per host. The template that can be used to configure a host is shown below.

Important

This template below creates an interface called “shim” interface on each host. The shim interface uses the host public network interface (for example, vlan220) as a parent. The static IP is then given to the shim interface and not to the parent. Similarly, routing uses the shim. This is a critical detail. Macvlan disallows the virtual network of connected Pods on any given host from reaching the host directly or through switch hairpin. Without the shim interface, OpenShift Data Foundation will not function properly. Do not attempt to set up the OpenShift Data Foundation Multus public network without configuring the shim interface.

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: ceph-public-net-shim-{{NODE_NAME}}
spec:
  nodeSelector:
    node-role.kubernetes.io/worker: ""
    kubernetes.io/hostname: {{NODE_NAME}}
  desiredState:
    interfaces:
      - name: odf-pub-shim
        description: Shim interface to connect to ODF public network
        type: mac-vlan
        state: up
        mac-vlan:
          base-iface: vlan220 # host public network interface
          mode: bridge
          promiscuous: true
        ipv4:
          enabled: true
          dhcp: false
          address:
            - ip: 192.168.252.1 # static IP in node public network range
              prefix-length: 23 # node public network range mask
    routes:
      config:
        - destination: 192.168.240.0/21 # pod public network range
          next-hop-interface: odf-pub-shim

First, follow comments in the template to update the base template for the environment being deployed. Then, for each node, copy the template, and fill in {{NODE_NAME}} and a unique static IP for each node.

6.8. Preinstallation checklist for Multus configuration
Copy link

After all the previous sections have been planned or executed, OpenShift Data Foundation preinstallation can begin. The following checklist of preinstallation items should be completed before proceeding further:

Read the Multus architecture document in its entirety.
Select which OpenShift Data Foundation Multus networks to use.
Plan host interface connections.
Deploy OpenShift.
Select network address ranges to use.
Write NetworkAttachmentDefinition manifests for Pods.
Write NodeNetworkConfigurationPolicy manifests for nodes.
Deploy the manifests to the OpenShift cluster
Install OpenShift Data Foundation, but do not deploy a StorageCluster yet
Run the Multus validation tool to check preinstallation deployments (covered in the next section)

6.9. Validating and debugging Multus configuration
Copy link

OpenShift Data Foundation has a tool to perform basic connectivity tests prior to OpenShift Data Foundation StorageCluster deployment. This Multus validation tool is intended to be a self-help tool that returns a list of details that might be incorrect based on the validation steps that pass or fail.

This tool performs basic connection and IP availability checking. It cannot detect every possible misconfiguration, and it cannot detect network performance issues. Ensure due diligence for each cluster deployment.

When encountering an issue with the tool, check the tool output for suggestions. Use events associated with Pods the tool runs (for example, using oc -n openshift-storage describe pods) to debug further. If issues persist despite debugging, collect OpenShift Data Foundation must-gather, and open an OpenShift Data Foundation help ticket that includes full tool output and the must-gather tarball. For more information, see link: Downloading log files and diagnostic information using must-gather.

Read more about and download the ODF Multus validation tool from the knowledgebase link: OpenShift Data Foundation - Multus prerequisite validation tool.

For best results, use a configuration file with the Multus validation tool. The following configuration is recommended, which can be updated by following the comments as needed.

publicNetwork: "public-net" # comment out if public network is unused
# clusterNetwork: "cluster-net" # uncomment if used


# for airgapped environments, mirror this image, then update this config to reference the mirror
nginxImage: "quay.io/nginx/nginx-unprivileged:stable-alpine"

namespace: "openshift-storage"
serviceAccountName: "rook-ceph-system"

nodeTypes:
  storage-nodes:
    osdsPerNode: 12 # set to the max number of OSDs per storage node
    otherDaemonsPerNode: 16
    placement:
      nodeSelector:
        cluster.ocs.openshift.io/openshift-storage: ""
      tolerations:
        - {"key":"node.ocs.openshift.io/storage","value":"true"}
  worker-nodes:
    osdsPerNode: 0
    otherDaemonsPerNode: 6
    placement:
      nodeSelector:
      tolerations:

If this recommendation proves to be inadequate or out of date, use the tool to generate a new configuration using ./rook multus validation config dedicated-storage-nodes.

6.10. OpenShift Data Foundation installation after validating Multus configuration and post install checks
Copy link

After validating the connectivity using the Multus validation tool, OpenShift Data Foundation storage cluster can be deployed. Refer to OpenShift Data Foundation documentation and make sure to select the appropriate Multus networks during deployment.

6.10.1. Post-installation checks
Copy link

In addition to the OpenShift Data Foundation health checking, ensure that Multus is configured as expected by performing the following:

Using the ODF toolbox or odf-cli tool, run ceph osd dump to verify that OSDs have an IP in the cluster network.
In the output, ensure that OSD daemons have an IP in the public network (if used). OSDs should additionally have an IP in the cluster network, if the Multus cluster network was used. If it was left empty, OSDs should only list public network IPs. If a filesystem has been deployed, ensure that MDS daemons have an IP in the public network (if used).
Create a test application Deployment in a new namespace that uses ODF PVCs.
Ensure that the test application Pod can write to the mounted PVC volumes. Delete the Deployment’s Pod, then wait for it to reschedule to a new node. Ensure that the previous data can be read back, and ensure new data can be written.
Run some test load generators in the cluster to write to OpenShift Data Foundation PVCs. After the OpenShift Data Foundation cluster has filled to 20%, induce a hard failure of a node or failure zone. Leave the node or zone in failure state for an hour, and observe the OpenShift Data Foundation cluster status and OpenShift Data Foundation Pod statuses. Cluster health can be expected to show HEALTH_WARN, but it should not show HEALTH_ERR. One or two OpenShift Data Foundationd Pod failures may be okay, but several failures could indicate a problem.

Red Hat OpenShift Data Foundation architecture

Overview of OpenShift Data Foundation architecture and the roles that the components and services perform.

PrefaceCopy linkLink copied to clipboard!

Providing feedback on Red Hat documentationCopy linkLink copied to clipboard!

Chapter 1. Introduction to OpenShift Data FoundationCopy linkLink copied to clipboard!

Chapter 2. An overview of OpenShift Data Foundation architectureCopy linkLink copied to clipboard!

Chapter 3. OpenShift Data Foundation operatorsCopy linkLink copied to clipboard!

3.1. OpenShift Data Foundation operatorCopy linkLink copied to clipboard!

3.1.1. ComponentsCopy linkLink copied to clipboard!

3.1.2. Design diagramCopy linkLink copied to clipboard!

3.1.3. ResourcesCopy linkLink copied to clipboard!

3.1.4. LimitationCopy linkLink copied to clipboard!

3.1.5. High availabilityCopy linkLink copied to clipboard!

3.1.6. Relevant config filesCopy linkLink copied to clipboard!

3.1.7. Relevant log filesCopy linkLink copied to clipboard!

3.1.8. LifecycleCopy linkLink copied to clipboard!

3.2. OpenShift Container Storage operatorCopy linkLink copied to clipboard!

3.2.1. ComponentsCopy linkLink copied to clipboard!

3.2.2. Design diagramCopy linkLink copied to clipboard!

3.2.3. ResponsibilitiesCopy linkLink copied to clipboard!

3.2.4. ResourcesCopy linkLink copied to clipboard!

3.2.5. LimitationCopy linkLink copied to clipboard!

3.2.6. High availabilityCopy linkLink copied to clipboard!

3.2.7. Relevant config filesCopy linkLink copied to clipboard!

3.2.8. Relevant log filesCopy linkLink copied to clipboard!

3.2.9. LifecycleCopy linkLink copied to clipboard!

3.3. Rook-Ceph operatorCopy linkLink copied to clipboard!

3.3.1. ComponentsCopy linkLink copied to clipboard!

3.3.2. Design diagramCopy linkLink copied to clipboard!

3.3.3. ResponsibilitiesCopy linkLink copied to clipboard!

3.3.4. ResourcesCopy linkLink copied to clipboard!

3.3.5. LifecycleCopy linkLink copied to clipboard!

3.4. MCG operatorCopy linkLink copied to clipboard!

3.4.1. ComponentsCopy linkLink copied to clipboard!

3.4.2. Responsibilities and resourcesCopy linkLink copied to clipboard!

3.4.3. High availabilityCopy linkLink copied to clipboard!

3.4.4. Relevant log filesCopy linkLink copied to clipboard!

3.4.5. LifecycleCopy linkLink copied to clipboard!

Chapter 4. OpenShift Data Foundation installation overviewCopy linkLink copied to clipboard!

4.1. Installed OperatorsCopy linkLink copied to clipboard!

4.2. OpenShift Container Storage initializationCopy linkLink copied to clipboard!

4.3. Storage cluster creationCopy linkLink copied to clipboard!

4.3.1. Internal mode storage clusterCopy linkLink copied to clipboard!

4.3.1.1. Cluster CreationCopy linkLink copied to clipboard!

4.3.1.2. NooBaa System creationCopy linkLink copied to clipboard!

4.3.2. External mode storage clusterCopy linkLink copied to clipboard!

4.3.2.1. Cluster CreationCopy linkLink copied to clipboard!

4.3.2.2. NooBaa System creationCopy linkLink copied to clipboard!

4.3.3. MCG Standalone StorageClusterCopy linkLink copied to clipboard!

4.3.3.1. NooBaa System creationCopy linkLink copied to clipboard!

Chapter 5. OpenShift Data Foundation upgrade overviewCopy linkLink copied to clipboard!

5.1. Upgrade WorkflowsCopy linkLink copied to clipboard!

5.2. ClusterServiceVersion ReconciliationCopy linkLink copied to clipboard!

5.3. Operator ReconciliationCopy linkLink copied to clipboard!

Chapter 6. Multus architecture for OpenShift Data FoundationCopy linkLink copied to clipboard!

6.1. AssumptionsCopy linkLink copied to clipboard!

6.2. Example scenario usedCopy linkLink copied to clipboard!

6.3. How to choose Multus architecture for an OpenShift Data Foundation deploymentCopy linkLink copied to clipboard!

6.3.1. Multus public networkCopy linkLink copied to clipboard!

6.3.2. Multus public and cluster networksCopy linkLink copied to clipboard!

6.3.3. Multus cluster networkCopy linkLink copied to clipboard!

6.4. Multus networking requirementsCopy linkLink copied to clipboard!

6.4.1. General requirementsCopy linkLink copied to clipboard!

6.4.2. Routing requirementsCopy linkLink copied to clipboard!

6.4.3. Address space sizingCopy linkLink copied to clipboard!

6.5. Recommendations for Multus networkingCopy linkLink copied to clipboard!

6.5.1. General recommendations for Multus networkingCopy linkLink copied to clipboard!

6.5.2. Address rangesCopy linkLink copied to clipboard!

6.6. Planning host interface connectionsCopy linkLink copied to clipboard!

6.6.1. Example interfaces and connectionsCopy linkLink copied to clipboard!

6.6.2. Recommended host interface connectionCopy linkLink copied to clipboard!

6.7. Multus network configurationCopy linkLink copied to clipboard!

6.7.1. Public NetworkAttachmentDefinitionCopy linkLink copied to clipboard!

6.7.2. Cluster NetworkAttachmentDefinitionCopy linkLink copied to clipboard!

6.7.3. Node configuration for Multus public networkCopy linkLink copied to clipboard!

6.8. Preinstallation checklist for Multus configurationCopy linkLink copied to clipboard!

6.9. Validating and debugging Multus configurationCopy linkLink copied to clipboard!

6.10. OpenShift Data Foundation installation after validating Multus configuration and post install checksCopy linkLink copied to clipboard!

6.10.1. Post-installation checksCopy linkLink copied to clipboard!

Learn

Preface
Copy link

Providing feedback on Red Hat documentation
Copy link

Chapter 1. Introduction to OpenShift Data Foundation
Copy link

Chapter 2. An overview of OpenShift Data Foundation architecture
Copy link

Chapter 3. OpenShift Data Foundation operators
Copy link

3.1. OpenShift Data Foundation operator
Copy link

3.1.1. Components
Copy link

3.1.2. Design diagram
Copy link

3.1.3. Resources
Copy link

3.1.4. Limitation
Copy link

3.1.5. High availability
Copy link

3.1.6. Relevant config files
Copy link

3.1.7. Relevant log files
Copy link

3.1.8. Lifecycle
Copy link

3.2. OpenShift Container Storage operator
Copy link

3.2.1. Components
Copy link

3.2.2. Design diagram
Copy link

3.2.3. Responsibilities
Copy link

3.2.4. Resources
Copy link

3.2.5. Limitation
Copy link

3.2.6. High availability
Copy link

3.2.7. Relevant config files
Copy link

3.2.8. Relevant log files
Copy link

3.2.9. Lifecycle
Copy link

3.3. Rook-Ceph operator
Copy link

3.3.1. Components
Copy link

3.3.2. Design diagram
Copy link

3.3.3. Responsibilities
Copy link

3.3.4. Resources
Copy link

3.3.5. Lifecycle
Copy link

3.4. MCG operator
Copy link

3.4.1. Components
Copy link

3.4.2. Responsibilities and resources
Copy link

3.4.3. High availability
Copy link

3.4.4. Relevant log files
Copy link

3.4.5. Lifecycle
Copy link

Chapter 4. OpenShift Data Foundation installation overview
Copy link

4.1. Installed Operators
Copy link

4.2. OpenShift Container Storage initialization
Copy link

4.3. Storage cluster creation
Copy link

4.3.1. Internal mode storage cluster
Copy link

4.3.1.1. Cluster Creation
Copy link

4.3.1.2. NooBaa System creation
Copy link

4.3.2. External mode storage cluster
Copy link

4.3.2.1. Cluster Creation
Copy link

4.3.2.2. NooBaa System creation
Copy link

4.3.3. MCG Standalone StorageCluster
Copy link

4.3.3.1. NooBaa System creation
Copy link

Chapter 5. OpenShift Data Foundation upgrade overview
Copy link

5.1. Upgrade Workflows
Copy link

5.2. ClusterServiceVersion Reconciliation
Copy link

5.3. Operator Reconciliation
Copy link

Chapter 6. Multus architecture for OpenShift Data Foundation
Copy link

6.1. Assumptions
Copy link

6.2. Example scenario used
Copy link

6.3. How to choose Multus architecture for an OpenShift Data Foundation deployment
Copy link

6.3.1. Multus public network
Copy link

6.3.2. Multus public and cluster networks
Copy link

6.3.3. Multus cluster network
Copy link

6.4. Multus networking requirements
Copy link

6.4.1. General requirements
Copy link

6.4.2. Routing requirements
Copy link

6.4.3. Address space sizing
Copy link

6.5. Recommendations for Multus networking
Copy link

6.5.1. General recommendations for Multus networking
Copy link

6.5.2. Address ranges
Copy link

6.6. Planning host interface connections
Copy link

6.6.1. Example interfaces and connections
Copy link

6.6.2. Recommended host interface connection
Copy link

6.7. Multus network configuration
Copy link

6.7.1. Public NetworkAttachmentDefinition
Copy link

6.7.2. Cluster NetworkAttachmentDefinition
Copy link

6.7.3. Node configuration for Multus public network
Copy link

6.8. Preinstallation checklist for Multus configuration
Copy link

6.9. Validating and debugging Multus configuration
Copy link

6.10. OpenShift Data Foundation installation after validating Multus configuration and post install checks
Copy link

6.10.1. Post-installation checks
Copy link