このコンテンツは選択した言語では利用できません。

Chapter 6. OSD Configuration Reference

You can configure Ceph OSDs in the Ceph configuration file, but Ceph OSDs can use the default values and a very minimal configuration. A minimal Ceph OSD configuration sets the osd journal size and osd host options, and uses default values for almost everything else.

Ceph OSDs are numerically identified in incremental fashion, beginning with 0 using the following convention:

osd.0
osd.1
osd.2

osd.0
osd.1
osd.2

Copy to Clipboard

Toggle word wrap

In a configuration file, you can specify settings for all Ceph OSDs in the cluster by adding configuration settings to the [osd] section of the configuration file. To add settings directly to a particular Ceph OSD (for example, osd host), enter it in a section specific only to that OSD in the Ceph configuration file. For example:

[osd]
osd journal size = 1024

[osd.0]
osd host = osd-host-a

[osd.1]
osd host = osd-host-b

[osd]
osd journal size = 1024

[osd.0]
osd host = osd-host-a

[osd.1]
osd host = osd-host-b

Copy to Clipboard

Toggle word wrap

6.1. General Settings
リンクのコピー

The following settings provide a Ceph OSD’s ID, and determine paths to data and journals. Ceph deployment scripts typically generate the UUID automatically.

Important

Red Hat does not recommend changing the default paths for data or journals, as it makes it more problematic to troubleshoot Ceph later.

The journal size should be at least twice the product of the expected drive speed multiplied by the value of the filestore max sync interval option. However, the most common practice is to partition the journal drive (often an SSD), and mount it such that Ceph uses the entire partition for the journal.

osd_uuid

Description: The universally unique identifier (UUID) for the Ceph OSD.
Type: UUID
Default: The UUID.
Note: The osd uuid applies to a single Ceph OSD. The fsid applies to the entire cluster.

osd_data

Description: The path to the OSD’s data. You must create the directory when deploying Ceph. Mount a drive for OSD data at this mount point. Red Hat does not recommend changing the default.
Type: String
Default: /var/lib/ceph/osd/$cluster-$id

osd_max_write_size

Description: The maximum size of a write in megabytes.
Type: 32-bit Integer
Default: 90

osd_client_message_size_cap

Description: The largest client data message allowed in memory.
Type: 64-bit Integer Unsigned
Default: 500MB default. 500*1024L*1024L

osd_class_dir

Description: The class path for RADOS class plug-ins.
Type: String
Default: $libdir/rados-classes

6.2. Journal Settings
リンクのコピー

By default, Ceph expects that you will store a Ceph OSD’s journal with the following path:

/var/lib/ceph/osd/$cluster-$id/journal

/var/lib/ceph/osd/$cluster-$id/journal

Copy to Clipboard

Toggle word wrap

Without performance optimization, Ceph stores the journal on the same disk as the Ceph OSD’s data. A Ceph OSD optimized for performance can use a separate disk to store journal data, for example, a solid state drive delivers high performance journaling.

A journal size should find the product of the filestore max sync interval and the expected throughput, and multiply the product by two (2):

osd journal size = <2 * (expected throughput * filestore max sync interval)>

osd journal size = <2 * (expected throughput * filestore max sync interval)>

Copy to Clipboard

Toggle word wrap

The expected throughput number should include the expected disk throughput (that is, sustained data transfer rate), and network throughput. For example, a 7200 RPM disk will likely have approximately 100 MB/s. Taking the min() of the disk and network throughput should provide a reasonable expected throughput. Some users just start off with a 10GB journal size. For example:

osd journal size = 10000

osd journal size = 10000

Copy to Clipboard

Toggle word wrap

Warning

Sizing the journal correctly for your OSDs is important. Using a small journal will lead to a slower recovery in the event of an OSD failure. The number of recovery threads has to be decreased in order to have a stable recovery by keeping pressure in journal at an acceptable level. Also, committing transactions to the file store will be slower and could lead to the file store hanging if the queued transaction size is bigger than the journal size.

osd_journal

Description: The path to the OSD’s journal. This may be a path to a file or a block device (such as a partition of an SSD). If it is a file, you must create the directory to contain it. We recommend using a drive separate from the osd data drive.
Type: String
Default: /var/lib/ceph/osd/$cluster-$id/journal

osd_journal_size

Description: The size of the journal in megabytes. If this is 0, and the journal is a block device, the entire block device is used. This is ignored if the journal is a block device, and the entire block device is used.
Type: 32-bit Integer
Default: 5120
Recommended: Begin with 1GB. Should be at least twice the product of the expected speed multiplied by filestore max sync interval.

6.3. Scrubbing
リンクのコピー

In addition to making multiple copies of objects, Ceph insures data integrity by scrubbing placement groups. Ceph scrubbing is analogous to the fsck command on the object storage layer.

For each placement group, Ceph generates a catalog of all objects and compares each primary object and its replicas to ensure that no objects are missing or mismatched.

Light scrubbing (daily) checks the object size and attributes. Deep scrubbing (weekly) reads the data and uses checksums to ensure data integrity.

Scrubbing is important for maintaining data integrity, but it can reduce performance. Adjust the following settings to increase or decrease scrubbing operations.

osd_max_scrubs

Description: The maximum number of simultaneous scrub operations for a Ceph OSD.
Type: 32-bit Int
Default: 1

osd_scrub_thread_timeout

Description: The maximum time in seconds before timing out a scrub thread.
Type: 32-bit Integer
Default: 60

osd_scrub_finalize_thread_timeout

Description: The maximum time in seconds before timing out a scrub finalize thread.
Type: 32-bit Integer
Default: 60*10

osd_scrub_begin_hour

Description: The earliest hour that light or deep scrubbing can begin. It is used with the osd scrub end hour parameter to define a scrubbing time window and allows constraining scrubbing to off-peak hours. The setting takes an integer to specify the hour on the 24-hour cycle where 0 represents the hour from 12:01 a.m. to 1:00 a.m., 13 represents the hour from 1:01 p.m. to 2:00 p.m., and so on.
Type: 32-bit Integer
Default: 0 for 12:01 to 1:00 a.m.

osd_scrub_end_hour

Description: The latest hour that light or deep scrubbing can begin. It is used with the osd scrub begin hour parameter to define a scrubbing time window and allows constraining scrubbing to off-peak hours. The setting takes an integer to specify the hour on the 24-hour cycle where 0 represents the hour from 12:01 a.m. to 1:00 a.m., 13 represents the hour from 1:01 p.m. to 2:00 p.m., and so on. The end hour must be greater than the begin hour.
Type: 32-bit Integer
Default: 24 for 11:01 p.m. to 12:00 a.m.

osd_scrub_load_threshold

Description: The maximum load. Ceph will not scrub when the system load (as defined by the getloadavg() function) is higher than this number. Default is 0.5.
Type: Float
Default: 0.5

osd_scrub_min_interval

Description: The minimum interval in seconds for scrubbing the Ceph OSD when the Red Hat Ceph Storage cluster load is low.
Type: Float
Default: Once per day. 60*60*24

osd_scrub_max_interval

Description: The maximum interval in seconds for scrubbing the Ceph OSD irrespective of cluster load.
Type: Float
Default: Once per week. 7*60*60*24

osd_scrub_interval_randomize_ratio

Description: Takes the ratio and randomizes the scheduled scrub between osd scrub min interval and osd scrub max interval.
Type: Float
Default: 0.5.

mon_warn_not_scrubbed

Description: Number of seconds after osd_scrub_interval to warn about any PGs that were not scrubbed.
Type: Integer
Default: 0 (no warning).

osd_scrub_chunk_min

Description: The object store is partitioned into chunks which end on hash boundaries. For chunky scrubs, Ceph scrubs objects one chunk at a time with writes blocked for that chunk. The osd scrub chunk min setting represents minimum number of chunks to scrub.
Type: 32-bit Integer
Default: 5

osd_scrub_chunk_max

Description: The maximum number of chunks to scrub.
Type: 32-bit Integer
Default: 25

osd_scrub_sleep

Description: The time to sleep between deep scrub operations.
Type: Float
Default: 0 (or off).

osd_scrub_during_recovery

Description: Allows scrubbing during recovery.
Type: Bool
Default: false

osd_scrub_invalid_stats

Description: Forces extra scrub to fix stats marked as invalid.
Type: Bool
Default: true

osd_scrub_priority

Description: Controls queue priority of scrub operations versus client I/O.
Type: Unsigned 32-bit Integer
Default: 5

osd_scrub_cost

Description: Cost of scrub operations in megabytes for queue scheduling purposes.
Type: Unsigned 32-bit Integer
Default: 50 << 20

osd_deep_scrub_interval

Description: The interval for deep scrubbing, that is fully reading all data. The osd scrub load threshold parameter does not affect this setting.
Type: Float
Default: Once per week. 60*60*24*7

osd_deep_scrub_stride

Description: Read size when doing a deep scrub.
Type: 32-bit Integer
Default: 512 KB. 524288

mon_warn_not_deep_scrubbed

Description: Number of seconds after osd_deep_scrub_interval to warn about any PGs that were not scrubbed.
Type: Integer
Default: 0 (no warning).

osd_deep_scrub_randomize_ratio

Description: The rate at which scrubs will randomly become deep scrubs (even before osd_deep_scrub_interval has past).
Type: Float
Default: 0.15 or 15%.

osd_deep_scrub_update_digest_min_age

Description: How many seconds old objects must be before scrub updates the whole-object digest.
Type: Integer
Default: 120 (2 hours).

6.4. Operations
リンクのコピー

Operations settings allow you to configure the number of threads for servicing requests. If you set the osd op threads parameter to 0, it disables multi-threading.

By default, Ceph uses two threads with a 30 second timeout and a 30 second complaint time if an operation does not complete within those time parameters. Set operations priority weights between client operations and recovery operations to ensure optimal performance during recovery.

osd_op_threads

Description: The number of threads to service Ceph OSD operations. Set to 0 to disable it. Increasing the number might increase the request processing rate.
Type: 32-bit Integer
Default: 2

osd_client_op_priority

Description: The priority set for client operations. It is relative to osd recovery op priority.
Type: 32-bit Integer
Default: 63
Valid Range: 1-63

osd_recovery_op_priority

Description: The priority set for recovery operations. It is relative to osd client op priority.
Type: 32-bit Integer
Default: 3
Valid Range: 1-63

osd_op_thread_timeout

Description: The Ceph OSD operation thread timeout in seconds.
Type: 32-bit Integer
Default: 30

osd_op_complaint_time

Description: An operation becomes complaint worthy after the specified number of seconds have elapsed.
Type: Float
Default: 30

osd_disk_threads

Description: The number of disk threads, which are used to perform background disk intensive OSD operations such as scrubbing and snap trimming.
Type: 32-bit Integer
Default: 1

osd_disk_thread_ioprio_class

Description

Sets the ioprio_set(2) I/O scheduling class for the disk thread. Acceptable values are:

idle
be
rt
The idle class means the disk thread will have lower priority than any other thread in the OSD. This is useful to slow down scrubbing on an OSD that is busy handling client operations.
The be class is the default and is the same priority as all other threads in the OSD.
The rt class means the disk thread will have precedence over all other threads in the OSD. This is useful if scrubbing is much needed and must make progress at the expense of client operations.

Type

String

Default

an empty string

osd_disk_thread_ioprio_priority

Description: It sets the ioprio_set(2) I/O scheduling priority of the disk thread ranging from 0 (highest) to 7 (lowest). If all OSDs on a given host were in class idle and compete for I/O due to controller congestion, it can be used to lower the disk thread priority of one OSD to 7 so that another OSD with priority 0 can potentially scrub faster.
Type: Integer in the range of 0 to 7 or -1 if not to be used.
Default: -1

Important

The osd disk thread ioprio class and osd disk thread ioprio priority options will only be used if both are set to a non default value. In addition, it only works with the Linux Kernel CFQ scheduler.

osd_op_history_size

Description: The maximum number of completed operations to track.
Type: 32-bit Unsigned Integer
Default: 20

osd_op_history_duration

Description: The oldest completed operation to track.
Type: 32-bit Unsigned Integer
Default: 600

osd_op_log_threshold

Description: How many operations logs to display at once.
Type: 32-bit Integer
Default: 5

osd_op_timeout

Description: The time in seconds after which running OSD operations time out.
Type: Integer
Default: 0

Important

Do not set the osd op timeout option unless your clients can handle the consequences. For example, setting this parameter on clients running in virtual machines can lead to data corruption because the virtual machines interpret this timeout as a hardware failure.

6.5. Backfilling
リンクのコピー

When you add Ceph OSDs to a cluster or remove them from the cluster, the CRUSH algorithm rebalances the cluster by moving placement groups to or from Ceph OSDs to restore the balance. The process of migrating placement groups and the objects they contain can reduce the cluster operational performance considerably. To maintain operational performance, Ceph performs this migration with the 'backfill' process, which allows Ceph to set backfill operations to a lower priority than requests to read or write data.

osd_max_backfills

Description: The maximum number of backfill operations allowed to or from a single OSD.
Type: 64-bit Unsigned Integer
Default: 1

osd_backfill_scan_min

Description: The minimum number of objects per backfill scan.
Type: 32-bit Integer
Default: 64

osd_backfill_scan_max

Description: The maximum number of objects per backfill scan.
Type: 32-bit Integer
Default: 512

osd_backfill_full_ratio

Description: Refuse to accept backfill requests when the Ceph OSD’s full ratio is above this value.
Type: Float
Default: 0.85

osd_backfill_retry_interval

Description: The number of seconds to wait before retrying backfill requests.
Type: Double
Default: 10.0

6.6. OSD Map
リンクのコピー

OSD maps reflect the OSD daemons operating in the cluster. Over time, the number of map epochs increases. Ceph provides the following settings to ensure that Ceph performs well as the OSD map grows larger.

osd_map_dedup

Description: Enable removing duplicates in the OSD map.
Type: Boolean
Default: true

osd_map_cache_size

Description: The size of the OSD map cache in megabytes.
Type: 32-bit Integer
Default: 200

osd_map_cache_bl_size

Description: The size of the in-memory OSD map cache in OSD daemons.
Type: 32-bit Integer
Default: 50

osd_map_cache_bl_inc_size

Description: The size of the in-memory OSD map cache incrementals in OSD daemons.
Type: 32-bit Integer
Default: 100

osd_map_message_max

Description: The maximum map entries allowed per MOSDMap message.
Type: 32-bit Integer
Default: 100

6.7. Recovery
リンクのコピー

When the cluster starts or when a Ceph OSD terminates unexpectedly and restarts, the OSD begins peering with other Ceph OSDs before write operation can occur.

If a Ceph OSD crashes and comes back online, usually it will be out of sync with other Ceph OSDs containing more recent versions of objects in the placement groups. When this happens, the Ceph OSD goes into recovery mode and seeks to get the latest copy of the data and bring its map back up to date. Depending upon how long the Ceph OSD was down, the OSD’s objects and placement groups may be significantly out of date. Also, if a failure domain went down (for example, a rack), more than one Ceph OSD may come back online at the same time. This can make the recovery process time consuming and resource intensive.

To maintain operational performance, Ceph performs recovery with limitations on the number recovery requests, threads and object chunk sizes which allows Ceph perform well in a degraded state.

osd_recovery_delay_start

Description: After peering completes, Ceph will delay for the specified number of seconds before starting to recover objects.
Type: Float
Default: 0

osd_recovery_max_active

Description: The number of active recovery requests per OSD at one time. More requests will accelerate recovery, but the requests places an increased load on the cluster.
Type: 32-bit Integer
Default: 3

osd_recovery_max_chunk

Description: The maximum size of a recovered chunk of data to push.
Type: 64-bit Integer Unsigned
Default: 8 << 20

osd_recovery_threads

Description: The number of threads for recovering data.
Type: 32-bit Integer
Default: 1

osd_recovery_thread_timeout

Description: The maximum time in seconds before timing out a recovery thread.
Type: 32-bit Integer
Default: 30

osd_recover_clone_overlap

Description: Preserves clone overlap during recovery. Should always be set to true.
Type: Boolean
Default: true

6.8. Miscellaneous
リンクのコピー

osd_snap_trim_thread_timeout

Description: The maximum time in seconds before timing out a snap trim thread.
Type: 32-bit Integer
Default: 60*60*1

osd_pg_max_concurrent_snap_trims

Description: The max number of parallel snap trims/PG. This controls how many objects per PG to trim at once.
Type: 32-bit Integer
Default: 2

osd_snap_trim_sleep

Description: Insert a sleep between every trim operation a PG issues.
Type: 32-bit Integer
Default: 0

osd_max_trimming_pgs

Description: The max number of trimming PGs
Type: 32-bit Integer
Default: 2

osd_backlog_thread_timeout

Description: The maximum time in seconds before timing out a backlog thread.
Type: 32-bit Integer
Default: 60*60*1

osd_default_notify_timeout

Description: The OSD default notification timeout (in seconds).
Type: 32-bit Integer Unsigned
Default: 30

osd_check_for_log_corruption

Description: Check log files for corruption. Can be computationally expensive.
Type: Boolean
Default: false

osd_remove_thread_timeout

Description: The maximum time in seconds before timing out a remove OSD thread.
Type: 32-bit Integer
Default: 60*60

osd_command_thread_timeout

Description: The maximum time in seconds before timing out a command thread.
Type: 32-bit Integer
Default: 10*60

osd_command_max_records

Description: Limits the number of lost objects to return.
Type: 32-bit Integer
Default: 256

osd_auto_upgrade_tmap

Description: Uses tmap for omap on old objects.
Type: Boolean
Default: true

osd_tmapput_sets_users_tmap

Description: Uses tmap for debugging only.
Type: Boolean
Default: false

osd_preserve_trimmed_log

Description: Preserves trimmed log files, but uses more disk space.
Type: Boolean
Default: false

このコンテンツは選択した言語では利用できません。

Chapter 6. OSD Configuration Reference

6.1. General Settings
リンクのコピー

6.2. Journal Settings
リンクのコピー

6.3. Scrubbing
リンクのコピー

6.4. Operations
リンクのコピー

6.5. Backfilling
リンクのコピー

6.6. OSD Map
リンクのコピー

6.7. Recovery
リンクのコピー

6.8. Miscellaneous
リンクのコピー

詳細情報

試用、購入および販売

コミュニティー

Red Hat ドキュメントについて

多様性を受け入れるオープンソースの強化

会社概要

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

このコンテンツは選択した言語では利用できません。

Chapter 6. OSD Configuration Reference

6.1. General Settingsリンクのコピーリンクがクリップボードにコピーされました!

6.2. Journal Settingsリンクのコピーリンクがクリップボードにコピーされました!

6.3. Scrubbingリンクのコピーリンクがクリップボードにコピーされました!

6.4. Operationsリンクのコピーリンクがクリップボードにコピーされました!

6.5. Backfillingリンクのコピーリンクがクリップボードにコピーされました!

6.6. OSD Mapリンクのコピーリンクがクリップボードにコピーされました!

6.7. Recoveryリンクのコピーリンクがクリップボードにコピーされました!

6.8. Miscellaneousリンクのコピーリンクがクリップボードにコピーされました!

詳細情報

試用、購入および販売

コミュニティー

Red Hat ドキュメントについて

多様性を受け入れるオープンソースの強化

会社概要

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

6.1. General Settings
リンクのコピー

6.2. Journal Settings
リンクのコピー

6.3. Scrubbing
リンクのコピー

6.4. Operations
リンクのコピー

6.5. Backfilling
リンクのコピー

6.6. OSD Map
リンクのコピー

6.7. Recovery
リンクのコピー

6.8. Miscellaneous
リンクのコピー