Chapter 10. Logging and Debugging

PDF

Usually, you add debugging to the Ceph configuration at runtime. You can also add Ceph debug logging to the Ceph configuration file if you are encountering issues when starting your cluster. Also, view Ceph log files under /var/log/ceph.

Tip

When debug output slows down your system, the latency can hide race conditions.

Logging is resource intensive. If you are encountering a problem in a specific area of your cluster, enable logging for that area of the cluster. For example, if your OSDs are running fine, but your gateways are not, you should start by enabling debug logging for the specific gateway instance(s) giving you trouble. Enable logging for each subsystem as needed.

Important

Verbose logging can generate over 1GB of data per hour. If your OS disk reaches its capacity, the node will stop working.

If you enable or increase the rate of Ceph logging, ensure that you have sufficient disk space on your OS disk. See Accelerating Log Rotation for details on rotating log files. When your system is running well, remove unnecessary debugging settings to ensure your cluster runs optimally. Logging debug output messages is relatively slow, and a waste of resources when operating your cluster.

See Subsystem, Log and Debug Settings for details on available settings.

10.1. Runtime

To see the configuration settings at runtime, log in to a host with a running daemon and execute the following command:

ceph --admin-daemon </path/to/admin/socket> config show | less
ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | less

To activate Ceph’s debugging output (dout()) at runtime, use the ceph tell command to inject arguments into the runtime configuration:

ceph tell <daemon-type>.<daemon id or *> injectargs --<name> <value> [--<name> <value>]

Replace <daemon-type> with one of osd or mon. To apply the runtime setting to all daemons of a particular type, use *, or specify a specific daemon’s ID (that is, its number or letter). For example, to increase debug logging for a ceph-osd daemon named osd.0:

ceph tell osd.0 injectargs --debug-osd 0/5

The ceph tell command goes through the monitors. If you cannot bind to the monitor, you can still make the change by logging into the host of the daemon whose configuration you want to change by using the ceph --admin-daemon command. For example:

sudo ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config set debug_osd 0/5

See Subsystem, Log and Debug Settings for details on available settings.

10.2. Boot Time

To activate Ceph’s debugging output (dout()) at boot time, add the debug settings to the Ceph configuration file. Subsystems common to each daemon can be set under the [global] section in the configuration file. Subsystems for particular daemons are set under the daemon section in the configuration file (that is, [mon], [osd]). For example:

[global]
debug_ms = 1/5

[mon]
debug_mon = 20
debug_paxos = 1/5
debug_auth = 2

[osd]
debug_osd = 1/5
debug_filestore = 1/5
debug_journal = 1
debug_monc = 5/20

See Subsystem, Log and Debug Settings for details on available settings.

10.3. Accelerating Log Rotation

If your OS disk is relatively full, you can accelerate log rotation by modifying the Ceph log rotation file at /etc/logrotate.d/ceph. Add a size setting after the rotation frequency to accelerate log rotation by using the Cron utility if your logs exceed the size setting. For example, the default setting looks like this:

rotate 7
weekly
compress
sharedscripts

Modify the configuration by adding the size setting.

rotate 7
weekly
size 500M
compress
sharedscripts

Then, start the crontab editor for your user space.

crontab -e

Finally, add an entry to check the etc/logrotate.d/ceph file.

30 * * * * /usr/sbin/logrotate /etc/logrotate.d/ceph >/dev/null 2>&1

The preceding example checks the etc/logrotate.d/ceph file every 30 minutes.

10.4. Valgrind

Debugging might also require you to track down memory and threading issues. You can run a single daemon, a type of daemon, or the whole cluster with the Valgrind utility. You should only use Valgrind when developing or debugging Ceph. Valgrind is computationally expensive, and will slow down your system otherwise. Valgrind messages are logged to stderr.

10.5. Subsystems, Log and Debug Settings

In most cases, you will enable debug logging output by using subsystems and on a temporary basis.

10.5.1. Subsystems

Each subsystem has a logging level for its output logs, and for its logs in-memory. You can set different values for each of these subsystems by setting a log file level and a memory level for debug logging. Ceph’s logging levels operate on a scale of 1 to 20, where 1 is terse and 20 is verbose.

A debug logging setting can take a single value for the log level and the memory level, which sets them both as the same value. For example, if you specify debug_ms = 5, Ceph will treat it as a log level and a memory level of 5. You can also specify them separately. The first setting is the log level, and the second setting is the memory level. You must separate them with a forward slash (/). For example, if you want to set the ms subsystem’s debug logging level to 1 and its memory level to 5, you would specify it as debug_ms = 1/5. For example:

debug_<subsystem> = <log-level>/<memory-level>
#for example
debug_osd = 1/20

The following table provides a list of Ceph subsystems and their default log and memory levels. Once you complete your logging efforts, restore the subsystems to their default level or to a level suitable for normal operations.

Subsystem	Log Level	Memory Level
`default`	0	5
`lockdep`	0	1
`context`	0	1
`crush`	0	1
`buffer`	0	0
`timer`	0	1
`filer`	0	1
`striper`	0	1
`objecter`	0	1
`rados`	0	5
`rbd`	0	5
`journaler`	0	5
`objectcacher`	0	5
`client`	0	5
`osd`	0	5
`optracker`	0	5
`objclass`	0	5
`filestore`	1	3
`journal`	1	3
`ms`	0	5
`mon`	1	5
`monc`	0	10
`paxos`	1	5
`tp`	0	5
`auth`	1	5
`finisher`	1	1
`heartbeatmap`	1	5
`perfcounter`	1	5
`rgw`	1	5
`civetweb`	1	10
`javaclient`	1	5
`asok`	1	5
`throttle`	1	1
`refs`	0	0
`xio`	1	5

Here are examples of the type of messages you will see in the logs when the verbosity is increased for the monitors and the OSDs.

Monitor Debug Settings

debug_ms = 5
debug_mon = 20
debug_paxos = 20
debug_auth = 20

Example Log Output

2016-02-12 12:37:04.278761 7f45a9afc700 10 mon.cephn2@0(leader).osd e322 e322: 2 osds: 2 up, 2 in
2016-02-12 12:37:04.278792 7f45a9afc700 10 mon.cephn2@0(leader).osd e322  min_last_epoch_clean 322
2016-02-12 12:37:04.278795 7f45a9afc700 10 mon.cephn2@0(leader).log v1010106 log
2016-02-12 12:37:04.278799 7f45a9afc700 10 mon.cephn2@0(leader).auth v2877 auth
2016-02-12 12:37:04.278811 7f45a9afc700 20 mon.cephn2@0(leader) e1 sync_trim_providers
2016-02-12 12:37:09.278914 7f45a9afc700 11 mon.cephn2@0(leader) e1 tick
2016-02-12 12:37:09.278949 7f45a9afc700 10 mon.cephn2@0(leader).pg v8126 v8126: 64 pgs: 64 active+clean; 60168 kB data, 172 MB used, 20285 MB / 20457 MB avail
2016-02-12 12:37:09.278975 7f45a9afc700 10 mon.cephn2@0(leader).paxosservice(pgmap 7511..8126) maybe_trim trim_to 7626 would only trim 115 < paxos_service_trim_min 250
2016-02-12 12:37:09.278982 7f45a9afc700 10 mon.cephn2@0(leader).osd e322 e322: 2 osds: 2 up, 2 in
2016-02-12 12:37:09.278989 7f45a9afc700  5 mon.cephn2@0(leader).paxos(paxos active c 1028850..1029466) is_readable = 1 - now=2016-02-12 12:37:09.278990 lease_expire=0.000000 has v0 lc 1029466
....
2016-02-12 12:59:18.769963 7f45a92fb700  1 -- 192.168.0.112:6789/0 <== osd.1 192.168.0.114:6800/2801 5724 ==== pg_stats(0 pgs tid 3045 v 0) v1 ==== 124+0+0 (2380105412 0 0) 0x5d96300 con 0x4d5bf40
2016-02-12 12:59:18.770053 7f45a92fb700  1 -- 192.168.0.112:6789/0 --> 192.168.0.114:6800/2801 -- pg_stats_ack(0 pgs tid 3045) v1 -- ?+0 0x550ae00 con 0x4d5bf40
2016-02-12 12:59:32.916397 7f45a9afc700  0 mon.cephn2@0(leader).data_health(1) update_stats avail 53% total 1951 MB, used 780 MB, avail 1053 MB
....
2016-02-12 13:01:05.256263 7f45a92fb700  1 -- 192.168.0.112:6789/0 --> 192.168.0.113:6800/2410 -- mon_subscribe_ack(300s) v1 -- ?+0 0x4f283c0 con 0x4d5b440

OSD Debug Settings

debug_ms = 5
debug_osd = 20
debug_filestore = 20
debug_journal = 20

Example Log Output

2016-02-12 11:27:53.869151 7f5d55d84700  1 -- 192.168.17.3:0/2410 --> 192.168.17.4:6801/2801 -- osd_ping(ping e322 stamp 2016-02-12 11:27:53.869147) v2 -- ?+0 0x63baa00 con 0x578dee0
2016-02-12 11:27:53.869214 7f5d55d84700  1 -- 192.168.17.3:0/2410 --> 192.168.0.114:6801/2801 -- osd_ping(ping e322 stamp 2016-02-12 11:27:53.869147) v2 -- ?+0 0x638f200 con 0x578e040
2016-02-12 11:27:53.870215 7f5d6359f700  1 -- 192.168.17.3:0/2410 <== osd.1 192.168.0.114:6801/2801 109210 ==== osd_ping(ping_reply e322 stamp 2016-02-12 11:27:53.869147) v2 ==== 47+0+0 (261193640 0 0) 0x63c1a00 con 0x578e040
2016-02-12 11:27:53.870698 7f5d6359f700  1 -- 192.168.17.3:0/2410 <== osd.1 192.168.17.4:6801/2801 109210 ==== osd_ping(ping_reply e322 stamp 2016-02-12 11:27:53.869147) v2 ==== 47+0+0 (261193640 0 0) 0x6313200 con 0x578dee0
....
2016-02-12 11:28:10.432313 7f5d6e71f700  5 osd.0 322 tick
2016-02-12 11:28:10.432375 7f5d6e71f700 20 osd.0 322 scrub_random_backoff lost coin flip, randomly backing off
2016-02-12 11:28:10.432381 7f5d6e71f700 10 osd.0 322 do_waiters -- start
2016-02-12 11:28:10.432383 7f5d6e71f700 10 osd.0 322 do_waiters -- finish

10.5.2. Logging Settings

Logging and debugging settings are not required in a Ceph configuration file, but you can override default settings as needed. Ceph supports the following settings:

log_file

Description: The location of the logging file for your cluster.
Type: String
Required: No
Default: /var/log/ceph/$cluster-$name.log

log_max_new

Description: The maximum number of new log files.
Type: Integer
Required: No
Default: 1000

log_max_recent

Description: The maximum number of recent events to include in a log file.
Type: Integer
Required: No
Default: 1000000

log_to_stderr

Description: Determines if logging messages appear in stderr.
Type: Boolean
Required: No
Default: true

err_to_stderr

Description: Determines if error messages appear in stderr.
Type: Boolean
Required: No
Default: true

log_to_syslog

Description: Determines if logging messages appear in syslog.
Type: Boolean
Required: No
Default: false

err_to_syslog

Description: Determines if error messages appear in syslog.
Type: Boolean
Required: No
Default: false

log_flush_on_exit

Description: Determines if Ceph flushes the log files after exit.
Type: Boolean
Required: No
Default: true

clog_to_monitors

Description: Determines if clog messages will be sent to monitors.
Type: Boolean
Required: No
Default: true

clog_to_syslog

Description: Determines if clog messages will be sent to syslog.
Type: Boolean
Required: No
Default: false

mon_cluster_log_to_syslog

Description: Determines if the cluster log will be output to syslog.
Type: Boolean
Required: No
Default: false

mon_cluster_log_file

Description: The location of the cluster’s log file.
Type: String
Required: No
Default: /var/log/ceph/$cluster.log

10.5.2.1. OSD

osd_preserve_trimmed_log

Description: Preserves trimmed logs after trimming.
Type: Boolean
Required: No
Default: false

osd_tmapput_sets_uses_tmap

Description: Uses tmap. For debug only.
Type: Boolean
Required: No
Default: false

osd_min_pg_log_entries

Description: The minimum number of log entries for placement groups.
Type: 32-bit Unsigned Integer
Required: No
Default: 1000

osd_op_log_threshold

Description: How many op log messages to show up in one pass.
Type: Integer
Required: No
Default: 5

10.5.2.2. File Store

filestore_debug_omap_check

Description: Debugging check on synchronization. This is an expensive operation.
Type: Boolean
Required: No
Default: 0

10.5.2.3. RADOS Gateway

rgw_log_nonexistent_bucket

Description: Log non-existent buckets.
Type: Boolean
Required: No
Default: false

rgw_log_object_name

Description: Log an object’s name.
Type: String
Required: No
Default: %Y-%m-%d-%H-%i-%n

rgw_log_object_name_utc

Description: Object log name contains UTC.
Type: Boolean
Required: No
Default: false

rgw_enable_ops_log

Description: Enables logging of every RGW operation.
Type: Boolean
Required: No
Default: true

rgw_enable_usage_log

Description: Enable logging of RGW’s bandwidth usage.
Type: Boolean
Required: No
Default: true

rgw_usage_log_flush_threshold

Description: Threshold to flush pending log data.
Type: Integer
Required: No
Default: 1024

rgw_usage_log_tick_interval

Description: Flush pending log data every s seconds.
Type: Integer
Required: No
Default: 30

rgw_intent_log_object_name

Description, Type: String
Required: No
Default: %Y-%m-%d-%i-%n

rgw_intent_log_object_name utc

Description: Include a UTC time stamp in the intent log object name.
Type: Boolean
Required: No
Default: false

Chapter 10. Logging and Debugging

10.1. Runtime

10.2. Boot Time

10.3. Accelerating Log Rotation

10.4. Valgrind

10.5. Subsystems, Log and Debug Settings

10.5.1. Subsystems

10.5.2. Logging Settings

10.5.2.1. OSD

10.5.2.2. File Store

10.5.2.3. RADOS Gateway

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links