Este conteúdo não está disponível no idioma selecionado.
Chapter 4. Create a Cluster
If at any point you run into trouble and you want to start over, execute the following to purge the configuration:
ceph-deploy purgedata <ceph-node> [<ceph-node>]
ceph-deploy forgetkeys
To purge the Ceph packages too, you may also execute:
ceph-deploy purge <ceph-node> [<ceph-node>]
If you execute purge, you must re-install Ceph.
On your Calamari admin node from the directory you created for holding your configuration details, perform the following steps using ceph-deploy.
Create the cluster. :
ceph-deploy new <initial-monitor-node(s)>For example:
ceph-deploy new node1Check the output of
ceph-deploywithlsandcatin the current directory. You should see a Ceph configuration file, a monitor secret keyring, and a log file of theceph-deployprocedures.At this stage, you may begin editing your Ceph configuration file.
NoteIf you choose not to use
ceph-deployyou will have to deploy Ceph manually or refer to Ceph manual deployment documentation and configure a deployment tool (e.g., Chef, Juju, Puppet, etc.) to perform each operationceph-deployperforms for you.Add the
public_networkandcluster_networksettings under the[global]section of your Ceph configuration file.public_network = <ip-address>/<netmask> cluster_network = <ip-address>/<netmask>These settings distinguish which network is public (front-side) and which network is for the cluster (back-side). Ensure that your nodes have interfaces configured for these networks. We do not recommend using the same NIC for the public and cluster networks.
Turn on IPv6 if you intend to use it.
ms_bind_ipv6 = trueAdd or adjust the
osd journal sizesetting under the[global]section of your Ceph configuration file.osd_journal_size = 10000We recommend a general setting of 10GB. Ceph’s default
osd_journal_sizeis0, so you will need to set this in yourceph.conffile. A journal size should find the product of thefilestore_max_sync_intervaland the expected throughput, and multiply the product by two (2). The expected throughput number should include the expected disk throughput (i.e., sustained data transfer rate), and network throughput. For example, a 7200 RPM disk will likely have approximately 100 MB/s. Taking themin()of the disk and network throughput should provide a reasonable expected throughput.Set the number of copies to store (default is
3) and the default minimum required write data when in adegradedstate (default is2) under the[global]section of your Ceph configuration file. We recommend the default values for production clusters.osd_pool_default_size = 3 osd_pool_default_min_size = 2For a quick start, you may wish to set
osd_pool_default_sizeto2, and theosd_pool_default_min_sizeto 1 so that you can achieve andactive+cleanstate with only two OSDs.These settings establish the networking bandwidth requirements for the cluster network, and the ability to write data with eventual consistency (i.e., you can write data to a cluster in a degraded state if it has
min_sizecopies of the data already).Set the default number of placement groups (
osd_pool_default_pg_num) and placement groups for placement (osd_pool_default_pgp_num) for a pool under the[global]section of your Ceph configuration file. The number you specify depends upon the number of OSDs in your cluster. For small clusters (< 5 OSDs) we recommend 128 placement groups per pool. Theosd_pool_default_pg_numandosd_pool_default_pgp_numvalue should be equal.osd_pool_default_pg_num = <n> osd_pool_default_pgp_num = <n>-
Less than 5 OSDs set
pg_numandpgp_numto 128 -
Between 5 and 10 OSDs set
pg_numandpgp_numto 512 -
Between 10 and 50 OSDs set
pg_numandpgp_numto 4096 If you have more than 50 OSDs, you need to understand the tradeoffs and how to calculate the
pg_numandpgp_numvalues. Generally, you may use the formula:(OSDs * 100) Total PGs = ------------ pool sizeWhere the
pool sizein the formula above is theosd_pool_default_sizevalue you set in the preceding step. For best results, round the result of this formula up to the nearest power of two. It is an optional step, but it will help CRUSH balance objects evenly across placement groups.
-
Less than 5 OSDs set
Set the maximum number of placement groups per OSD. The Ceph Storage Cluster has a default maximum value of 300 placement groups per OSD. You can set a different maximum value in your Ceph configuration file.
mon_pg_warn_max_per_osdMultiple pools can use the same CRUSH ruleset. When an OSD has too many placement groups associated to it, Ceph performance may degrade due to resource use and load. This setting warns you, but you may adjust it to your needs and the capabilities of your hardware.
Set a CRUSH leaf type to the largest serviceable failure domain for your replicas under the
[global]section of your Ceph configuration file. The default value is1, or host, which means that CRUSH will map replicas to OSDs on separate separate hosts. For example, if you want to make three object replicas, and you have three racks of chassis/hosts, you can setosd_crush_chooseleaf_typeto3, and CRUSH will place each copy of an object on OSDs in different racks. For example:osd_crush_chooseleaf_type = 3The default CRUSH hierarchy types are:
- type 0 osd
- type 1 host
- type 2 chassis
- type 3 rack
- type 4 row
- type 5 pdu
- type 6 pod
- type 7 room
- type 8 datacenter
- type 9 region
- type 10 root
Set
max_open_filesso that Ceph will set the maximum open file descriptors at the OS level to help prevent Ceph OSD Daemons from running out of file descriptors.max_open_files = 131072We recommend having settings for clock drift in your Ceph configuration in addition to setting up NTP on your monitor nodes, because clock drift is a common reason monitors fail to achieve a consensus on the state of the cluster. We recommend having the report time out and down out interval in the Ceph configuration file so you have a reference point for how long an OSD can be down before the cluster starts re-balancing.
mon_clock_drift_allowed = .15 mon_clock_drift_warn_backoff = 30 mon_osd_down_out_interval = 300 mon_osd_report_timeout = 300Set the full_ratio and near_full_ratio to acceptable values. They default to full at 95% and near full at 85% by default. You may also set backfill_full_ratio so that OSDs don’t accept backfill requests when they are already near capacity.
mon_osd_full_ratio = .75 mon_osd_nearfull_ratio = .65 osd_backfill_full_ratio = .65Consider the amount of storage capacity that would be unavailable during the failure of a large-grained failure domain such as a rack (e.g., the failure of a power distribution unit or a rack switch). You should consider the cost/benefit tradeoff of having that amount of extra capacity available for the failure of a large-grained failure domain if you have stringent high availability requirements. As a best practice, as you get close to reaching the full ratio, you should start receiving "near full" warnings so that you have ample time to provision additional hardware for your cluster. "Near full" warnings may be annoying, but they are not as annoying as an interruption of service.
ImportantWhen your cluster reaches its full ratio, Ceph prevents clients from accessing the cluster to ensure data durability. This results in a service interruption, so you should carefully consider the implications of capacity planning and the implications of reaching full capacity—especially in view of failure.
In summary, your initial Ceph configuration file should have at least the following settings with appropriate values assigned after the = sign:
[global]
fsid = <cluster-id>
mon_initial_members = <hostname>[, <hostname>]
mon_host = <ip-address>[, <ip-address>]
public_network = <network>[, <network>]
cluster_network = <network>[, <network>]
ms_bind_ipv6 = [true | false]
max_open_files = 131072
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_journal_size = <n>
filestore_xattr_use_omap = true
osd_pool_default_size = <n> # Write an object n times.
osd_pool_default_min_size = <n> # Allow writing n copy in a degraded state.
osd_pool_default_pg_num = <n>
osd_pool_default_pgp_num = <n>
osd_crush_chooseleaf_type = <n>
mon_osd_full_ratio = <n>
mon_osd_nearfull_ratio = <n>
osd_backfill_full_ratio = <n>
mon_clock_drift_allowed = .15
mon_clock_drift_warn_backoff = 30
mon_osd_down_out_interval = 300
mon_osd_report_timeout = 300