Chapter 16. PG Count
The number of placement groups in a pool plays a significant role in how a cluster peers, distributes data and rebalances. Small clusters don’t see as many performance improvements compared to large clusters by increasing the number of placement groups. However, clusters that have many pools accessing the same OSDs may need to carefully consider PG count so that Ceph OSDs use resources efficiently.
16.1. Configuring Default PG Counts
When you create a pool, you also create a number of placement groups for the pool. If you don’t specify the number of placement groups, Ceph will use the default value of 8
, which is unacceptably low. You can increase the number of placement groups for a pool, but we recommend setting reasonable default values in your Ceph configuration file too.
osd pool default pg num = 300 osd pool default pgp num = 300
You need to set both the number of placement groups (total), and the number of placement groups used for objects (used in PG splitting). They should be equal.
16.2. PG Count for Small Clusters
Small clusters don’t benefit from large numbers of placement groups. So you should consider the following values:
-
Less than 5 OSDs set
pg_num
andpgp_num
to 128. -
Between 5 and 10 OSDs set
pg_num
andpgp_num
to 512 -
Between 10 and 50 OSDs set
pg_num
andpgp_num
to 4096 -
If you have more than 50 OSDs, you need to understand the tradeoffs and how to calculate the
pg_num
andpgp_num
values. See Calculating PG Count.
As the number of OSDs increase, choosing the right value for pg_num
and pgp_num
becomes more important because it has a significant influence on the behavior of the cluster as well as the durability of the data when something goes wrong (i.e. the probability that a catastrophic event leads to data loss).
16.3. Calculating PG Count
If you have more than 50 OSDs, we recommend approximately 50-100 placement groups per OSD to balance out resource usage, data durability and distribution. If you have less than 50 OSDs, choosing among the PG Count for Small Clusters is ideal. For a single pool of objects, you can use the following formula to get a baseline:
(OSDs * 100) Total PGs = ------------ pool size
Where pool size is either the number of replicas for replicated pools or the K+M
sum for erasure coded pools (as returned by ceph osd erasure-code-profile get
).
You should then check if the result makes sense with the way you designed your Ceph cluster to maximize data durability, data distribution and minimize resource usage.
The result should be rounded up to the nearest power of two. Rounding up is optional, but recommended for CRUSH to evenly balance the number of objects among placement groups.
For a cluster with 200 OSDs and a pool size of 3 replicas, you would estimate your number of PGs as follows:
(200 * 100) ----------- = 6667. Nearest power of 2: 8192 3
With 8192 placement groups distributed across 200 OSDs, that evaluates to approximately 41 placement groups per OSD. You also need to consider the number of pools you are likely to use in your cluster, since each pool will create placement groups too. Ensure that you have a reasonable maximum PG count.
16.4. Maximum PG Count
When using multiple data pools for storing objects, you need to ensure that you balance the number of placement groups per pool with the number of placement groups per OSD so that you arrive at a reasonable total number of placement groups. The aim is to achieve reasonably low variance per OSD without taxing system resources or making the peering process too slow.
In an exemplary Ceph Storage Cluster consisting of 10 pools, each pool with 512 placement groups on ten OSDs, there is a total of 5,120 placement groups spread over ten OSDs, or 512 placement groups per OSD. That may not use too many resources depending on your hardware configuration. By contrast, if you create 1,000 pools with 512 placement groups each, the OSDs will handle ~50,000 placement groups each and it would require significantly more resources. Operating with too many placement groups per OSD can significantly reduce performance, especially during rebalancing or recovery.
The Ceph Storage Cluster has a default maximum value of 300 placement groups per OSD. You can set a different maximum value in your Ceph configuration file.
mon pg warn max per osd
Ceph Object Gateways deploy with 10-15 pools, so you may consider using less than 100 PGs per OSD to arrive at a reasonable maximum number.