Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 11. Pools overview


Ceph clients store data in pools. When you create pools, you are creating an I/O interface for clients to store data.

From the perspective of a Ceph client, that is, block device, gateway, and the rest, interacting with the Ceph storage cluster is remarkably simple:

  • Create a cluster handle.
  • Connect the cluster handle to the cluster.
  • Create an I/O context for reading and writing objects and their extended attributes.

Creating a cluster handle and connecting to the cluster

To connect to the Ceph storage cluster, the Ceph client needs the following details:

  • The cluster name (which Ceph by default) - not using usually because it sounds ambiguous.
  • An initial monitor address.

Ceph clients usually retrieve these parameters using the default path for the Ceph configuration file and then read it from the file, but a user might also specify the parameters on the command line too. The Ceph client also provides a user name and secret key, authentication is on by default. Then, the client contacts the Ceph monitor cluster and retrieves a recent copy of the cluster map, including its monitors, OSDs and pools.

Creating a pool I/O context

To read and write data, the Ceph client creates an I/O context to a specific pool in the Ceph storage cluster. If the specified user has permissions for the pool, the Ceph client can read from and write to the specified pool.

Ceph’s architecture enables the storage cluster to provide this remarkably simple interface to Ceph clients so that clients might select one of the sophisticated storage strategies you define simply by specifying a pool name and creating an I/O context. Storage strategies are invisible to the Ceph client in all but capacity and performance. Similarly, the complexities of Ceph clients, such as mapping objects into a block device representation or providing an S3/Swift RESTful service, are invisible to the Ceph storage cluster.

A pool provides you with resilience, placement groups, CRUSH rules, and quotas.

  • Resilience: You can set how many OSD are allowed to fail without losing data. For replicated pools, it is the desired number of copies or replicas of an object. A typical configuration stores an object and one additional copy, that is, size = 2, but you can determine the number of copies or replicas. For erasure coded pools, it is the number of coding chunks, that is m=2 in the erasure code profile.
  • Placement Groups: You can set the number of placement groups for the pool. A typical configuration uses approximately 50-100 placement groups per OSD to provide optimal balancing without using up too many computing resources. When setting up multiple pools, be careful to ensure you set a reasonable number of placement groups for both the pool and the cluster as a whole.
  • CRUSH Rules: When you store data in a pool, a CRUSH rule mapped to the pool enables CRUSH to identify the rule for the placement of each object and its replicas, or chunks for erasure coded pools, in your cluster. You can create a custom CRUSH rule for your pool.
  • Quotas: When you set quotas on a pool with ceph osd pool set-quota command, you might limit the maximum number of objects or the maximum number of bytes stored in the specified pool.

11.1. Pools and storage strategies overview

To manage pools, you can list, create, and remove pools. You can also view the utilization statistics for each pool.

11.2. Listing pool

List your cluster’s pools:

Example

[ceph: root@host01 /]# ceph osd lspools
Copy to Clipboard Toggle word wrap

11.3. Creating a pool

Before creating pools, see the Configuration Guide for more details.

It is better to adjust the default value for the number of placement groups, as the default value does not have to suit your needs:

Example

[ceph: root@host01 /]# ceph config set global osd_pool_default_pg_num 250
[ceph: root@host01 /]# ceph config set global osd_pool_default_pgp_num 250
Copy to Clipboard Toggle word wrap

Create a replicated pool:

Syntax

ceph osd pool create POOL_NAME PG_NUM PGP_NUM [replicated] \
         [CRUSH_RULE_NAME] [EXPECTED_NUMBER_OBJECTS]
Copy to Clipboard Toggle word wrap

Create an erasure-coded pool:

Syntax

ceph osd pool create POOL_NAME PG_NUM PGP_NUM erasure \
         [ERASURE_CODE_PROFILE] [CRUSH_RULE_NAME] [EXPECTED_NUMBER_OBJECTS]
Copy to Clipboard Toggle word wrap

Create a bulk pool:

Syntax

ceph osd pool create POOL_NAME [--bulk]
Copy to Clipboard Toggle word wrap

Where:

POOL_NAME
Description
The name of the pool. It must be unique.
Type
String
Required
Yes. If not specified, it is set to the default value.
Default
ceph
PG_NUM
Description
The total number of placement groups for the pool. See the Placement Groups section and the Ceph Placement Groups (PGs) per Pool Calculator for details on calculating a suitable number. The default value 8 is not suitable for most systems.
Type
Integer
Required
Yes
Default
8
PGP_NUM
Description
The total number of placement groups for placement purposes. This value must be equal to the total number of placement groups, except for placement group splitting scenarios.
Type
Integer
Required
Yes. If not specified it is set to the default value.
Default
8
replicated or erasure
Description
The pool type can be either replicated to recover from lost OSDs by keeping multiple copies of the objects or erasure to get a kind of generalized RAID5 capability. The replicated pools require more raw storage but implement all Ceph operations. The erasure-coded pools require less raw storage but only implement a subset of the available operations.
Type
String
Required
No
Default
replicated
CRUSH_RULE_NAME
Description
The name of the CRUSH rule for the pool. The rule MUST exist. For replicated pools, the name is the rule specified by the osd_pool_default_crush_rule configuration setting. For erasure-coded pools the name is erasure-code if you specify the default erasure code profile or POOL_NAME otherwise. Ceph creates this rule with the specified name implicitly if the rule does not already exist.
Type
String
Required
No
Default
Uses erasure-code for an erasure-coded pool. For replicated pools, it uses the value of the osd_pool_default_crush_rule variable from the Ceph configuration.
EXPECTED_NUMBER_OBJECTS
Description
The expected number of objects for the pool. Ceph splits the placement groups at pool creation time to avoid the latency impact to perform runtime directory splitting.
Type
Integer
Required
No
Default
0, no splitting at the pool creation time.
ERASURE_CODE_PROFILE
Description
For erasure-coded pools only. Use the erasure code profile. It must be an existing profile as defined by the osd erasure-code-profile set variable in the Ceph configuration file. For further information, see the Erasure Code Profiles section.
Type
String
Required
No

When you create a pool, set the number of placement groups to a reasonable value, for example to 100. Consider the total number of placement groups per OSD. Placement groups are computationally expensive, so performance degrades when you have many pools with many placement groups, for example, 50 pools with 100 placement groups each. The point of diminishing returns depends upon the power of the OSD host.

11.4. Setting pool quota

You can set pool quotas for the maximum number of bytes and the maximum number of objects per pool.

Syntax

ceph osd pool set-quota POOL_NAME [max_objects OBJECT_COUNT] [max_bytes BYTES]
Copy to Clipboard Toggle word wrap

Example

[ceph: root@host01 /]# ceph osd pool set-quota data max_objects 10000
Copy to Clipboard Toggle word wrap

To remove a quota, set its value to 0.

Note

In-flight write operations might overrun pool quotas for a short time until Ceph propagates the pool usage across the cluster. This is normal behavior. Enforcing pool quotas on in-flight write operations would impose significant performance penalties.

11.5. Deleting a pool

Delete a pool:

Syntax

ceph osd pool delete POOL_NAME [POOL_NAME --yes-i-really-really-mean-it]
Copy to Clipboard Toggle word wrap

Important

To protect data, storage administrators cannot delete pools by default. Set the mon_allow_pool_delete configuration option before deleting pools.

If a pool has its own rule, consider removing it after deleting the pool. If a pool has users strictly for its own use, consider deleting those users after deleting the pool.

11.6. Renaming a pool

Rename a pool:

Syntax

ceph osd pool rename CURRENT_POOL_NAME NEW_POOL_NAME
Copy to Clipboard Toggle word wrap

If you rename a pool and you have per-pool capabilities for an authenticated user, you must update the user’s capabilities, that is caps, with the new pool name.

11.7. Migrating a pool

Sometimes it is necessary to migrate all objects from one pool to another. This is done in cases such as needing to change parameters that cannot be modified on a specific pool. For example, needing to reduce the number of placement groups of a pool.

Important

When a workload is using only Ceph Block Device images, follow the procedures documented for moving and migrating a pool within the Red Hat Ceph Storage Block Device Guide:

The migration methods described for Ceph Block Device are more recommended than those documented here. using the cppool does not preserve all snapshots and snapshot related metadata, resulting in an unfaithful copy of the data. For example, copying an RBD pool does not completely copy the image. In this case, snaps are not present and will not work properly. The cppool also does not preserve the user_version field that some librados users may rely on.

If migrating a pool is necessary and your user workloads contain images other than Ceph Block Devices, continue with one of the procedures documented here.

Prerequisites

  • If using the rados cppool command:

    • Read-only access to the pool is required.
    • Only use this command if you do not have RBD images and its snaps and user_version consumed by librados.
  • If using the local drive RADOS commands, verify that sufficient cluster space is available. Two, three, or more copies of data will be present as per pool replication factor.

Procedure

Method one - the recommended direct way

Copy all objects with the rados cppool command.

Important

Read-only access to the pool is required during copy.

Syntax

ceph osd pool create NEW_POOL PG_NUM [ <other new pool parameters> ]
rados cppool SOURCE_POOL NEW_POOL
ceph osd pool rename SOURCE_POOL NEW_SOURCE_POOL_NAME
ceph osd pool rename NEW_POOL SOURCE_POOL
Copy to Clipboard Toggle word wrap

Example

[ceph: root@host01 /]# ceph osd pool create pool1 250
[ceph: root@host01 /]# rados cppool pool2 pool1
[ceph: root@host01 /]# ceph osd pool rename pool2 pool3
[ceph: root@host01 /]# ceph osd pool rename pool1 pool2
Copy to Clipboard Toggle word wrap

Method two - using a local drive

  1. Use the rados export and rados import commands and a temporary local directory to save all exported data.

    Syntax

    ceph osd pool create NEW_POOL PG_NUM [ <other new pool parameters> ]
    rados export --create SOURCE_POOL FILE_PATH
    rados import FILE_PATH NEW_POOL
    Copy to Clipboard Toggle word wrap

    Example

    [ceph: root@host01 /]# ceph osd pool create pool1 250
    [ceph: root@host01 /]# rados export --create pool2 <path of export file>
    [ceph: root@host01 /]# rados import <path of export file> pool1
    Copy to Clipboard Toggle word wrap

  2. Required. Stop all I/O to the source pool.
  3. Required. Resynchronize all modified objects.

    Syntax

    rados export --workers 5 SOURCE_POOL FILE_PATH
    rados import --workers 5 FILE_PATH NEW_POOL
    Copy to Clipboard Toggle word wrap

    Example

    [ceph: root@host01 /]# rados export --workers 5 pool2 <path of export file>
    [ceph: root@host01 /]# rados import --workers 5 <path of export file> pool1
    Copy to Clipboard Toggle word wrap

11.8. Viewing pool statistics

Show a pool’s utilization statistics:

Example

[ceph: root@host01 /] rados df
Copy to Clipboard Toggle word wrap

11.9. Setting pool values

Set a value to a pool:

Syntax

ceph osd pool set POOL_NAME KEY VALUE
Copy to Clipboard Toggle word wrap

The Pool Values section lists all key-values pairs that you can set.

11.10. Getting pool values

Get a value from a pool:

Syntax

ceph osd pool get POOL_NAME KEY
Copy to Clipboard Toggle word wrap

You can view the list of all key-values pairs that you might get in the Pool Values section.

11.11. Enabling a client application

Red Hat Ceph Storage provides additional protection for pools to prevent unauthorized types of clients from writing data to the pool. This means that system administrators must expressly enable pools to receive I/O operations from Ceph Block Device, Ceph Object Gateway, Ceph Filesystem or for a custom application.

Enable a client application to conduct I/O operations on a pool:

Syntax

ceph osd pool application enable POOL_NAME APP {--yes-i-really-mean-it}
Copy to Clipboard Toggle word wrap

Where APP is:

  • cephfs for the Ceph Filesystem.
  • rbd for the Ceph Block Device.
  • rgw for the Ceph Object Gateway.
Note

Specify a different APP value for a custom application.

Important

A pool that is not enabled will generate a HEALTH_WARN status. In that scenario, the output for ceph health detail -f json-pretty gives the following output:

{
    "checks": {
        "POOL_APP_NOT_ENABLED": {
            "severity": "HEALTH_WARN",
            "summary": {
                "message": "application not enabled on 1 pool(s)"
            },
            "detail": [
                {
                    "message": "application not enabled on pool '_POOL_NAME_'"
                },
                {
                    "message": "use 'ceph osd pool application enable _POOL_NAME_ _APP_', where _APP_ is 'cephfs', 'rbd', 'rgw', or freeform for custom applications."
                }
            ]
        }
    },
    "status": "HEALTH_WARN",
    "overall_status": "HEALTH_WARN",
    "detail": [
        "'ceph health' JSON format has changed in luminous. If you see this your monitoring system is scraping the wrong fields. Disable this with 'mon health preluminous compat warning = false'"
    ]
}
Copy to Clipboard Toggle word wrap
Note

Initialize pools for the Ceph Block Device with rbd pool init POOL_NAME.

11.12. Disabling a client application

Disable a client application from conducting I/O operations on a pool:

Syntax

ceph osd pool application disable POOL_NAME APP {--yes-i-really-mean-it}
Copy to Clipboard Toggle word wrap

Where APP is:

  • cephfs for the Ceph Filesystem.
  • rbd for the Ceph Block Device.
  • rgw for the Ceph Object Gateway.
Note

Specify a different APP value for a custom application.

11.13. Setting application metadata

Provides the functionality to set key-value pairs describing attributes of the client application.

Set client application metadata on a pool:

Syntax

ceph osd pool application set POOL_NAME APP KEY
Copy to Clipboard Toggle word wrap

Where APP is:

  • cephfs for the Ceph Filesystem.
  • rbd for the Ceph Block Device
  • rgw for the Ceph Object Gateway
Note

Specify a different APP value for a custom application.

11.14. Removing application metadata

Remove client application metadata on a pool:

Syntax

ceph osd pool application rm POOL_NAME APP KEY
Copy to Clipboard Toggle word wrap

Where APP is:

  • cephfs for the Ceph Filesystem.
  • rbd for the Ceph Block Device
  • rgw for the Ceph Object Gateway
Note

Specify a different APP value for a custom application.

11.15. Setting the number of object replicas

Set the number of object replicas on a replicated pool:

Syntax

ceph osd pool set POOL_NAME size NUMBER_OF_REPLICAS
Copy to Clipboard Toggle word wrap

You can run this command for each pool.

Important

The NUMBER_OF_REPLICAS parameter includes the object itself. If you want to include the object and two copies of the object for a total of three instances of the object, specify 3.

Example

[ceph: root@host01 /]# ceph osd pool set data size 3
Copy to Clipboard Toggle word wrap

Note

An object might accept I/O operations in degraded mode with fewer replicas than specified by the pool size setting. To set a minimum number of required replicas for I/O, use the min_size setting.

Example

[ceph: root@host01 /]# ceph osd pool set data min_size 2
Copy to Clipboard Toggle word wrap

This ensures that no object in the data pool receives an I/O with fewer replicas than specified by the min_size setting.

11.16. Getting the number of object replicas

Get the number of object replicas:

Example

[ceph: roo@host01 /]# ceph osd dump | grep 'replicated size'
Copy to Clipboard Toggle word wrap

Ceph lists the pools, with the replicated size attribute highlighted. By default, Ceph creates two replicas of an object, that is a total of three copies, or a size of 3.

11.17. Pool values

The following list contains key-values pairs that you can set or get. For further information, see the Set Pool Values and Getting Pool Values sections.

Expand
Table 11.1. Available pool values
ValueDescriptionTypeRequiredDefault

size

Specifies the number of replicas for objects in the pool. See the Setting the Number of Object Replicas section for further details. Applicable for the replicated pools only.

Integer

No

None

min_size

Specifies the minimum number of replicas required for I/O. See the Setting the Number of Object Replicas section for further details. For erasure-coded pools, this should be set to a value greater than k. If I/O is allowed at the value k, then there is no redundancy and data is lost in the event of a permanent OSD failure. For more information, see Erasure code pools overview.

Integer

No

None

crash_replay_interval

Specifies the number of seconds to allow clients to replay acknowledged, but uncommitted requests.

Integer

No

None

pg_num

The total number of placement groups for the pool. See the Pool, placement groups, and CRUSH Configuration Reference section in the Red Hat Ceph Storage Configuration Guide for details on calculating a suitable number. The default value 8 is not suitable for most systems.

Integer

Yes

8

pgp-num

The total number of placement groups for placement purposes. This should be equal to the total number of placement groups, except for placement group splitting scenarios. Valid range: Equal to or less than what specified by the pg_num variable.

Integer

Yes. Picks up default or Ceph configuration value if not specified.

None

crush_rule

The rule to use for mapping object placement in the cluster.

String

Yes

None

hashpspool

Enable or disable the HASHPSPOOL flag on a given pool. With this option enabled, pool hashing and placement group mapping are changed to improve the way pools and placement groups overlap. Valid settings: 1 enables the flag, 0 disables the flag. IMPORTANT: Do not enable this option on production pools of a cluster with a large amount of OSDs and data. All placement groups in the pool would have to be remapped causing too much data movement.

Integer

No

None

fast_read

On a pool that uses erasure coding, if this flag is enabled, the read request issues subsequent reads to all shards, and waits until it receives enough shards to decode to serve the client. In the case of the jerasure and isa erasure plug-ins, once the first K replies return, the client’s request is served immediately using the data decoded from these replies. This helps to allocate some resources for better performance. Currently this flag is only supported for erasure coding pools.

Boolean

No

0

allow_ec_overwrites

Whether writes to an erasure coded pool can update part of an object, so the Ceph Filesystem and Ceph Block Device can use it.

Boolean

No

None

compression_algorithm

Sets inline compression algorithm to use with the BlueStore storage backend. This setting overrides the bluestore_compression_algorithm configuration setting. Valid settings: lz4, snappy, zlib, zstd

String

No

None

compression_mode

Sets the policy for the inline compression algorithm for the BlueStore storage backend. This setting overrides the bluestore_compression_mode configuration setting. Valid settings: none, passive, aggressive, force

String

No

None

compression_min_blob_size

BlueStore does not compress chunks smaller than this size. This setting overrides the bluestore_compression_min_blob_size configuration setting.

Unsigned Integer

No

None

compression_max_blob_size

BlueStore breaks chunks larger than this size into smaller blobs of compression_max_blob_size before compressing the data.

Unsigned Integer

No

None

nodelete

Set or unset the NODELETE flag on a given pool. Valid range: 1 sets flag. 0 unsets flag.

Integer

No

None

nopgchange

Set or unset the NOPGCHANGE flag on a given pool.

Integer

No

None

nosizechange

Set or unset the NOSIZECHANGE flag on a given pool. Valid range: 1 sets the flag. 0 unsets the flag.

Integer

No

None

write_fadvise_dontneed

Set or unset the WRITE_FADVISE_DONTNEED flag on a given pool. Valid range: 1 sets the flag. 0 unsets the flag.

Integer

No

None

noscrub

Set or unset the NOSCRUB flag on a given pool. Valid range: 1 sets the flag. 0 unsets the flag.

Integer

No

None

nodeep-scrub

Set or unset the NODEEP_SCRUB flag on a given pool. Valid range: 1 sets the flag. 0 unsets the flag.

Integer

No

None

scrub_min_interval

The minimum interval in seconds for pool scrubbing when load is low. If it is 0, Ceph uses the osd_scrub_min_interval configuration setting.

Double

No

0

scrub_max_interval

The maximum interval in seconds for pool scrubbing irrespective of cluster load. If it is 0, Ceph uses the osd_scrub_max_interval configuration setting.

Double

No

0

deep_scrub_interval

The interval in seconds for pool 'deep' scrubbing. If it is 0, Ceph uses the osd_deep_scrub_interval configuration setting.

Double

No

0

peering_crush_bucket_count

The value is used along with peering_crush_bucket_barrier to determined whether the set of OSDs in the chosen acting set can peer with each other, based on the number of distinct buckets there are in the acting set.

Integer

No

None

peering crush_bucket_target

This value is used along with peering_crush_bucket_barrier and size to calculate the value bucket_max which limits the number of OSDs in the same bucket from getting chose to be in the acting set of a PG.

Integer

No

None

peering crush_bucket_barrier

The type of bucket a pool is stretched across. For example, rack, row, or datacenter.

String

No

None

Nach oben
Red Hat logoGithubredditYoutubeTwitter

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Wir helfen Red Hat Benutzern, mit unseren Produkten und Diensten innovativ zu sein und ihre Ziele zu erreichen – mit Inhalten, denen sie vertrauen können. Entdecken Sie unsere neuesten Updates.

Mehr Inklusion in Open Source

Red Hat hat sich verpflichtet, problematische Sprache in unserem Code, unserer Dokumentation und unseren Web-Eigenschaften zu ersetzen. Weitere Einzelheiten finden Sie in Red Hat Blog.

Über Red Hat

Wir liefern gehärtete Lösungen, die es Unternehmen leichter machen, plattform- und umgebungsübergreifend zu arbeiten, vom zentralen Rechenzentrum bis zum Netzwerkrand.

Theme

© 2025 Red Hat