Search

Chapter 10. Logging and Debugging

download PDF

Usually, you add debugging to the Ceph configuration at runtime. You can also add Ceph debug logging to the Ceph configuration file if you are encountering issues when starting your cluster. Also, view Ceph log files under /var/log/ceph.

Tip

When debug output slows down your system, the latency can hide race conditions.

Logging is resource intensive. If you are encountering a problem in a specific area of your cluster, enable logging for that area of the cluster. For example, if your OSDs are running fine, but your gateways are not, you should start by enabling debug logging for the specific gateway instance(s) giving you trouble. Enable logging for each subsystem as needed.

Important

Verbose logging can generate over 1GB of data per hour. If your OS disk reaches its capacity, the node will stop working.

If you enable or increase the rate of Ceph logging, ensure that you have sufficient disk space on your OS disk. See Accelerating Log Rotation for details on rotating log files. When your system is running well, remove unnecessary debugging settings to ensure your cluster runs optimally. Logging debug output messages is relatively slow, and a waste of resources when operating your cluster.

See Subsystem, Log and Debug Settings for details on available settings.

10.1. Runtime

To see the configuration settings at runtime, log in to a host with a running daemon and execute the following command:

ceph --admin-daemon </path/to/admin/socket> config show | less
ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | less

To activate Ceph’s debugging output (dout()) at runtime, use the ceph tell command to inject arguments into the runtime configuration:

ceph tell <daemon-type>.<daemon id or *> injectargs --<name> <value> [--<name> <value>]

Replace <daemon-type> with one of osd or mon. To apply the runtime setting to all daemons of a particular type, use *, or specify a specific daemon’s ID (that is, its number or letter). For example, to increase debug logging for a ceph-osd daemon named osd.0:

ceph tell osd.0 injectargs --debug-osd 0/5

The ceph tell command goes through the monitors. If you cannot bind to the monitor, you can still make the change by logging into the host of the daemon whose configuration you want to change by using the ceph --admin-daemon command. For example:

sudo ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config set debug_osd 0/5

See Subsystem, Log and Debug Settings for details on available settings.

10.2. Boot Time

To activate Ceph’s debugging output (dout()) at boot time, add the debug settings to the Ceph configuration file. Subsystems common to each daemon can be set under the [global] section in the configuration file. Subsystems for particular daemons are set under the daemon section in the configuration file (that is, [mon], [osd]). For example:

[global]
debug_ms = 1/5

[mon]
debug_mon = 20
debug_paxos = 1/5
debug_auth = 2

[osd]
debug_osd = 1/5
debug_filestore = 1/5
debug_journal = 1
debug_monc = 5/20

See Subsystem, Log and Debug Settings for details on available settings.

10.3. Accelerating Log Rotation

If your OS disk is relatively full, you can accelerate log rotation by modifying the Ceph log rotation file at /etc/logrotate.d/ceph. Add a size setting after the rotation frequency to accelerate log rotation by using the Cron utility if your logs exceed the size setting. For example, the default setting looks like this:

rotate 7
weekly
compress
sharedscripts

Modify the configuration by adding the size setting.

rotate 7
weekly
size 500M
compress
sharedscripts

Then, start the crontab editor for your user space.

crontab -e

Finally, add an entry to check the etc/logrotate.d/ceph file.

30 * * * * /usr/sbin/logrotate /etc/logrotate.d/ceph >/dev/null 2>&1

The preceding example checks the etc/logrotate.d/ceph file every 30 minutes.

10.4. Valgrind

Debugging might also require you to track down memory and threading issues. You can run a single daemon, a type of daemon, or the whole cluster with the Valgrind utility. You should only use Valgrind when developing or debugging Ceph. Valgrind is computationally expensive, and will slow down your system otherwise. Valgrind messages are logged to stderr.

10.5. Subsystems, Log and Debug Settings

In most cases, you will enable debug logging output by using subsystems and on a temporary basis.

10.5.1. Subsystems

Each subsystem has a logging level for its output logs, and for its logs in-memory. You can set different values for each of these subsystems by setting a log file level and a memory level for debug logging. Ceph’s logging levels operate on a scale of 1 to 20, where 1 is terse and 20 is verbose.

A debug logging setting can take a single value for the log level and the memory level, which sets them both as the same value. For example, if you specify debug_ms = 5, Ceph will treat it as a log level and a memory level of 5. You can also specify them separately. The first setting is the log level, and the second setting is the memory level. You must separate them with a forward slash (/). For example, if you want to set the ms subsystem’s debug logging level to 1 and its memory level to 5, you would specify it as debug_ms = 1/5. For example:

debug_<subsystem> = <log-level>/<memory-level>
#for example
debug_osd = 1/20

The following table provides a list of Ceph subsystems and their default log and memory levels. Once you complete your logging efforts, restore the subsystems to their default level or to a level suitable for normal operations.

SubsystemLog LevelMemory Level

default

0

5

lockdep

0

1

context

0

1

crush

0

1

buffer

0

0

timer

0

1

filer

0

1

striper

0

1

objecter

0

1

rados

0

5

rbd

0

5

journaler

0

5

objectcacher

0

5

client

0

5

osd

0

5

optracker

0

5

objclass

0

5

filestore

1

3

journal

1

3

ms

0

5

mon

1

5

monc

0

10

paxos

1

5

tp

0

5

auth

1

5

finisher

1

1

heartbeatmap

1

5

perfcounter

1

5

rgw

1

5

civetweb

1

10

javaclient

1

5

asok

1

5

throttle

1

1

refs

0

0

xio

1

5

Here are examples of the type of messages you will see in the logs when the verbosity is increased for the monitors and the OSDs.

Monitor Debug Settings

debug_ms = 5
debug_mon = 20
debug_paxos = 20
debug_auth = 20

Example Log Output

2016-02-12 12:37:04.278761 7f45a9afc700 10 mon.cephn2@0(leader).osd e322 e322: 2 osds: 2 up, 2 in
2016-02-12 12:37:04.278792 7f45a9afc700 10 mon.cephn2@0(leader).osd e322  min_last_epoch_clean 322
2016-02-12 12:37:04.278795 7f45a9afc700 10 mon.cephn2@0(leader).log v1010106 log
2016-02-12 12:37:04.278799 7f45a9afc700 10 mon.cephn2@0(leader).auth v2877 auth
2016-02-12 12:37:04.278811 7f45a9afc700 20 mon.cephn2@0(leader) e1 sync_trim_providers
2016-02-12 12:37:09.278914 7f45a9afc700 11 mon.cephn2@0(leader) e1 tick
2016-02-12 12:37:09.278949 7f45a9afc700 10 mon.cephn2@0(leader).pg v8126 v8126: 64 pgs: 64 active+clean; 60168 kB data, 172 MB used, 20285 MB / 20457 MB avail
2016-02-12 12:37:09.278975 7f45a9afc700 10 mon.cephn2@0(leader).paxosservice(pgmap 7511..8126) maybe_trim trim_to 7626 would only trim 115 < paxos_service_trim_min 250
2016-02-12 12:37:09.278982 7f45a9afc700 10 mon.cephn2@0(leader).osd e322 e322: 2 osds: 2 up, 2 in
2016-02-12 12:37:09.278989 7f45a9afc700  5 mon.cephn2@0(leader).paxos(paxos active c 1028850..1029466) is_readable = 1 - now=2016-02-12 12:37:09.278990 lease_expire=0.000000 has v0 lc 1029466
....
2016-02-12 12:59:18.769963 7f45a92fb700  1 -- 192.168.0.112:6789/0 <== osd.1 192.168.0.114:6800/2801 5724 ==== pg_stats(0 pgs tid 3045 v 0) v1 ==== 124+0+0 (2380105412 0 0) 0x5d96300 con 0x4d5bf40
2016-02-12 12:59:18.770053 7f45a92fb700  1 -- 192.168.0.112:6789/0 --> 192.168.0.114:6800/2801 -- pg_stats_ack(0 pgs tid 3045) v1 -- ?+0 0x550ae00 con 0x4d5bf40
2016-02-12 12:59:32.916397 7f45a9afc700  0 mon.cephn2@0(leader).data_health(1) update_stats avail 53% total 1951 MB, used 780 MB, avail 1053 MB
....
2016-02-12 13:01:05.256263 7f45a92fb700  1 -- 192.168.0.112:6789/0 --> 192.168.0.113:6800/2410 -- mon_subscribe_ack(300s) v1 -- ?+0 0x4f283c0 con 0x4d5b440

OSD Debug Settings

debug_ms = 5
debug_osd = 20
debug_filestore = 20
debug_journal = 20

Example Log Output

2016-02-12 11:27:53.869151 7f5d55d84700  1 -- 192.168.17.3:0/2410 --> 192.168.17.4:6801/2801 -- osd_ping(ping e322 stamp 2016-02-12 11:27:53.869147) v2 -- ?+0 0x63baa00 con 0x578dee0
2016-02-12 11:27:53.869214 7f5d55d84700  1 -- 192.168.17.3:0/2410 --> 192.168.0.114:6801/2801 -- osd_ping(ping e322 stamp 2016-02-12 11:27:53.869147) v2 -- ?+0 0x638f200 con 0x578e040
2016-02-12 11:27:53.870215 7f5d6359f700  1 -- 192.168.17.3:0/2410 <== osd.1 192.168.0.114:6801/2801 109210 ==== osd_ping(ping_reply e322 stamp 2016-02-12 11:27:53.869147) v2 ==== 47+0+0 (261193640 0 0) 0x63c1a00 con 0x578e040
2016-02-12 11:27:53.870698 7f5d6359f700  1 -- 192.168.17.3:0/2410 <== osd.1 192.168.17.4:6801/2801 109210 ==== osd_ping(ping_reply e322 stamp 2016-02-12 11:27:53.869147) v2 ==== 47+0+0 (261193640 0 0) 0x6313200 con 0x578dee0
....
2016-02-12 11:28:10.432313 7f5d6e71f700  5 osd.0 322 tick
2016-02-12 11:28:10.432375 7f5d6e71f700 20 osd.0 322 scrub_random_backoff lost coin flip, randomly backing off
2016-02-12 11:28:10.432381 7f5d6e71f700 10 osd.0 322 do_waiters -- start
2016-02-12 11:28:10.432383 7f5d6e71f700 10 osd.0 322 do_waiters -- finish

10.5.2. Logging Settings

Logging and debugging settings are not required in a Ceph configuration file, but you can override default settings as needed. Ceph supports the following settings:

log_file
Description
The location of the logging file for your cluster.
Type
String
Required
No
Default
/var/log/ceph/$cluster-$name.log
log_max_new
Description
The maximum number of new log files.
Type
Integer
Required
No
Default
1000
log_max_recent
Description
The maximum number of recent events to include in a log file.
Type
Integer
Required
No
Default
1000000
log_to_stderr
Description
Determines if logging messages appear in stderr.
Type
Boolean
Required
No
Default
true
err_to_stderr
Description
Determines if error messages appear in stderr.
Type
Boolean
Required
No
Default
true
log_to_syslog
Description
Determines if logging messages appear in syslog.
Type
Boolean
Required
No
Default
false
err_to_syslog
Description
Determines if error messages appear in syslog.
Type
Boolean
Required
No
Default
false
log_flush_on_exit
Description
Determines if Ceph flushes the log files after exit.
Type
Boolean
Required
No
Default
true
clog_to_monitors
Description
Determines if clog messages will be sent to monitors.
Type
Boolean
Required
No
Default
true
clog_to_syslog
Description
Determines if clog messages will be sent to syslog.
Type
Boolean
Required
No
Default
false
mon_cluster_log_to_syslog
Description
Determines if the cluster log will be output to syslog.
Type
Boolean
Required
No
Default
false
mon_cluster_log_file
Description
The location of the cluster’s log file.
Type
String
Required
No
Default
/var/log/ceph/$cluster.log

10.5.2.1. OSD

osd_preserve_trimmed_log
Description
Preserves trimmed logs after trimming.
Type
Boolean
Required
No
Default
false
osd_tmapput_sets_uses_tmap
Description
Uses tmap. For debug only.
Type
Boolean
Required
No
Default
false
osd_min_pg_log_entries
Description
The minimum number of log entries for placement groups.
Type
32-bit Unsigned Integer
Required
No
Default
1000
osd_op_log_threshold
Description
How many op log messages to show up in one pass.
Type
Integer
Required
No
Default
5

10.5.2.2. File Store

filestore_debug_omap_check
Description
Debugging check on synchronization. This is an expensive operation.
Type
Boolean
Required
No
Default
0

10.5.2.3. RADOS Gateway

rgw_log_nonexistent_bucket
Description
Log non-existent buckets.
Type
Boolean
Required
No
Default
false
rgw_log_object_name
Description
Log an object’s name.
Type
String
Required
No
Default
%Y-%m-%d-%H-%i-%n
rgw_log_object_name_utc
Description
Object log name contains UTC.
Type
Boolean
Required
No
Default
false
rgw_enable_ops_log
Description
Enables logging of every RGW operation.
Type
Boolean
Required
No
Default
true
rgw_enable_usage_log
Description
Enable logging of RGW’s bandwidth usage.
Type
Boolean
Required
No
Default
true
rgw_usage_log_flush_threshold
Description
Threshold to flush pending log data.
Type
Integer
Required
No
Default
1024
rgw_usage_log_tick_interval
Description
Flush pending log data every s seconds.
Type
Integer
Required
No
Default
30
rgw_intent_log_object_name
Description, Type
String
Required
No
Default
%Y-%m-%d-%i-%n
rgw_intent_log_object_name utc
Description
Include a UTC time stamp in the intent log object name.
Type
Boolean
Required
No
Default
false
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.