Search

Chapter 4. Block Device Configuration

download PDF

4.1. General Settings

rbd_op_threads
Description
The number of block device operation threads.
Type
Integer
Default
1
rbd_op_thread_timeout
Description
The timeout (in seconds) for block device operation threads.
Type
Integer
Default
60
rbd_non_blocking_aio
Description
If true, Ceph will process block device asynchronous I/O operations from a worker thread to prevent blocking.
Type
Boolean
Default
true
rbd_concurrent_management_ops
Description
The maximum number of concurrent management operations in flight (for example, deleting or resizing an image).
Type
Integer
Default
10
rbd_request_timed_out_seconds
Description
The number of seconds before a maintenance request times out.
Type
Integer
Default
30
rbd_clone_copy_on_read
Description
When set to true, copy-on-read cloning is enabled.
Type
Boolean
Default
false
rbd_enable_alloc_hint
Description
If true, allocation hinting is enabled, and the block device will issue a hint to the OSD backend to indicate the expected size object.
Type
Boolean
Default
true
rbd_skip_partial_discard
Description
If true, the block device will skip zeroing a range when trying to discard a range inside an object.
Type
Boolean
Default
false

4.2. Default Settings

It is possible to override the default settings for creating an image. Ceph will create images with format 2 and no striping.

rbd_default_format
Description
The default format (2) if no other format is specified. Format 1 is the original format for a new image, which is compatible with all versions of librbd and the kernel module, but does not support newer features like cloning. Format 2 is supported by librbd and the kernel module since version 3.11 (except for striping). Format 2 adds support for cloning and is more easily extensible to allow more features in the future.
Type
Integer
Default
2
rbd_default_order
Description
The default order if no other order is specified.
Type
Integer
Default
22
rbd_default_stripe_count
Description
The default stripe count if no other stripe count is specified. Changing the default value requires striping v2 feature.
Type
64-bit Unsigned Integer
Default
0
rbd_default_stripe_unit
Description
The default stripe unit if no other stripe unit is specified. Changing the unit from 0 (that is, the object size) requires the striping v2 feature.
Type
64-bit Unsigned Integer
Default
0
rbd_default_features
Description

The default features enabled when creating an block device image. This setting only applies to format 2 images. The settings are:

1: Layering support. Layering enables you to use cloning.

2: Striping v2 support. Striping spreads data across multiple objects. Striping helps with parallelism for sequential read/write workloads.

4: Exclusive locking support. When enabled, it requires a client to get a lock on an object before making a write.

8: Object map support. Block devices are thin provisioned—​meaning, they only store data that actually exists. Object map support helps track which objects actually exist (have data stored on a drive). Enabling object map support speeds up I/O operations for cloning, or importing and exporting a sparsely populated image.

16: Fast-diff support. Fast-diff support depends on object map support and exclusive lock support. It adds another property to the object map, which makes it much faster to generate diffs between snapshots of an image, and the actual data usage of a snapshot much faster.

32: Deep-flatten support. Deep-flatten makes rbd flatten work on all the snapshots of an image, in addition to the image itself. Without it, snapshots of an image will still rely on the parent, so the parent will not be delete-able until the snapshots are deleted. Deep-flatten makes a parent independent of its clones, even if they have snapshots.

The enabled features are the sum of the numeric settings.

Type
Integer
Default
3, or layering support and striping support.
rbd_default_map_options
Description
Most of the options are useful mainly for debugging and benchmarking. See man rbd under Map Options for details.
Type
String
Default
""

4.3. Cache Settings

The user space implementation of the Ceph block device (that is, librbd) cannot take advantage of the Linux page cache, so it includes its own in-memory caching, called RBD caching. RBD caching behaves just like well-behaved hard disk caching. When the OS sends a barrier or a flush request, all dirty data is written to the OSDs. This means that using write-back caching is just as safe as using a well-behaved physical hard disk with a VM that properly sends flushes (that is, Linux kernel >= 2.6.32). The cache uses a Least Recently Used (LRU) algorithm, and in write-back mode it can coalesce contiguous requests for better throughput.

Ceph supports write-back caching for RBD. To enable it, add rbd cache = true to the [client] section of your ceph.conf file. By default librbd does not perform any caching. Writes and reads go directly to the storage cluster, and writes return only when the data is on disk on all replicas. With caching enabled, writes return immediately, unless there are more than rbd cache max dirty unflushed bytes. In this case, the write triggers writeback and blocks until enough bytes are flushed.

Ceph supports write-through caching for RBD. You can set the size of the cache, and you can set targets and limits to switch from write-back caching to write through caching. To enable write-through mode, set rbd cache max dirty to 0. This means writes return only when the data is on disk on all replicas, but reads may come from the cache. The cache is in memory on the client, and each RBD image has its own. Since the cache is local to the client, there’s no coherency if there are others accessing the image. Running GFS or OCFS on top of RBD will not work with caching enabled.

The ceph.conf file settings for RBD should be set in the [client] section of your configuration file. The settings include:

rbd cache
Description
Enable caching for RADOS Block Device (RBD).
Type
Boolean
Required
No
Default
true
rbd cache size
Description
The RBD cache size in bytes.
Type
64-bit Integer
Required
No
Default
32 MiB
rbd cache max dirty
Description
The dirty limit in bytes at which the cache triggers write-back. If 0, uses write-through caching.
Type
64-bit Integer
Required
No
Constraint
Must be less than rbd cache size.
Default
24 MiB
rbd cache target dirty
Description
The dirty target before the cache begins writing data to the data storage. Does not block writes to the cache.
Type
64-bit Integer
Required
No
Constraint
Must be less than rbd cache max dirty.
Default
16 MiB
rbd cache max dirty age
Description
The number of seconds dirty data is in the cache before writeback starts.
Type
Float
Required
No
Default
1.0
rbd_cache_max_dirty_object
Description
The dirty limit for objects - set to 0 for auto calculate from rbd_cache_size.
Type
Integer
Default
0
rbd_cache_block_writes_upfront
Description
If true, it will block writes to the cache before the aio_write call completes. If false, it will block before the aio_completion is called.
Type
Boolean
Default
false
rbd cache writethrough until flush
Description
Start out in write-through mode, and switch to write-back after the first flush request is received. Enabling this is a conservative but safe setting in case VMs running on rbd are too old to send flushes, like the virtio driver in Linux before 2.6.32.
Type
Boolean
Required
No
Default
true

4.4. Parent/Child Reads

rbd_balance_snap_reads
Description
Ceph typically reads objects from the primary OSD. Since reads are immutable, you may enable this feature to balance snap reads between the primary OSD and the replicas.
Type
Boolean
Default
false
rbd_localize_snap_reads
Description
Whereas rbd_balance_snap_reads will randomize the replica for reading a snapshot, if you enable rbd_localize_snap_reads, the block device will look to the CRUSH map to find the closest (local) OSD for reading the snapshot.
Type
Boolean
Default
false
rbd_balance_parent_reads
Description
Ceph typically reads objects from the primary OSD. Since reads are immutable, you may enable this feature to balance parent reads between the primary OSD and the replicas.
Type
Boolean
Default
false
rbd_localize_parent_reads
Description
Whereas rbd_balance_parent_reads will randomize the replica for reading a parent, if you enable rbd_localize_parent_reads, the block device will look to the CRUSH map to find the closest (local) OSD for reading the parent.
Type
Boolean
Default
true

4.5. Read-ahead Settings

RBD supports read-ahead/prefetching to optimize small, sequential reads. This should normally be handled by the guest OS in the case of a VM, but boot loaders may not issue efficient reads. Read-ahead is automatically disabled if caching is disabled.

rbd readahead trigger requests
Description
Number of sequential read requests necessary to trigger read-ahead.
Type
Integer
Required
No
Default
10
rbd readahead max bytes
Description
Maximum size of a read-ahead request. If zero, read-ahead is disabled.
Type
64-bit Integer
Required
No
Default
512 KiB
rbd readahead disable after bytes
Description
After this many bytes have been read from an RBD image, read-ahead is disabled for that image until it is closed. This allows the guest OS to take over read-ahead once it is booted. If zero, read-ahead stays enabled.
Type
64-bit Integer
Required
No
Default
50 MiB

4.6. Blacklist

rbd_blacklist_on_break_lock
Description
Whether to blacklist clients whose lock was broken.
Type
Boolean
Default
true
rbd_blacklist_expire_seconds
Description
The number of seconds to blacklist - set to 0 for OSD default.
Type
Integer
Default
0
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.