Skip to content

Drive Groups

Joshua Schmid edited this page Oct 1, 2019 · 37 revisions

Drive Groups

This is a new way of defining OSD layouts included in DeepSea versions >= v0.9.14.

Older versions still follow the proposal runner approach.

Usage

There is a single file (/srv/salt/ceph/configuration/files/drive_groups.yml) which acts as a single source of information.

Preface

In order to create a OSD layout, you'll have to edit one file only. As opposed to the previous approach where we generated several yaml files based on parameters given to the proposal runner, we now compute the layout on the fly. This gives us more flexibility and avoids the need of having to refresh the pillar after each layout change. There are more advantages (link to the nautilus osd-deployment wikis)

Workflow

The mentioned file is located in /srv/salt/ceph/configuration/files/drive_groups.yml and acts as a single source of information.

  1. Open the file mentioned above
  2. Fill with a Drive Group spec according to your needs (See (Link to Examples) Examples) 2.1 When you are not sure about the properties of the disks you can use. salt-run disks.details

A very basic example would be:

# /srv/salt/ceph/configuration/files/drive_groups.yml
default_drive_group_name:   # <- this is the name of the drive_group (name can be custom)
  target: '*'  # <- the target - salt notation can be used 
  data_devices: # <- the type of devices you are applying specs to
    all: true # <- a specification, check below for a full list

Remember to use spaces instead of tabs (YAML).

That would just use all available(to ceph) drives as OSDs.

  1. Verify the results

There are new runners and modules which do the computation. The new user facing runner is called disks It exposes two main functions that can be used to verify your Drive Group.

salt-run disks.list

This returns you a structure of matching disks based on your Drive Group.

If you're not satisfied with the output, just go back to step 2, edit the file and run salt-run disks.list again.

Rinse-Repeat until you happy.

  1. Deploy

On the next stage.3 invocation those disks will be deployed accordingly.

Specification

drive_group_default_name:
  target: *
  data_devices:
    device_spec: arg 
  db_devices:
    device_spec: arg
  wal_devices:
    device_spec: arg
  block_wal_size: '5G' (optional)
  block_db_size: '5G' (optional)
  osds_per_device: 1 # number of osd daemons per device. To fully utilize nvme devices multiple osds are required.
  format: bluestore/filestore (defaults to bluestore)
  encrypted: True/False (defaults to False)
  db_slots: 5 (optional)
  wal_slots: 1 (optional)

when using filestore this would look something like this:

drive_group_default_name:
  target: *
  data_devices:
    device_spec: arg
  journal_devices:
    device_spec: arg
  format: filestore
  encrypted: True/False (optional)
  journal_size: '500M' (optional)

with {device_spec} being

Substring Matching:

# substring match on the ID_MODEL property of the drive
model: disk_model_name

# substring match on the VENDOR property of the drive
vendor: disk_vendor_name

Size Matching:

# Please note the quotes around the values when using delimiter notation. 
Yaml would interpret the ':' as a new hash. 

# Size specification of format LOW:HIGH. Can also take the 
# the form :HIGH, LOW: or an exact value (as ceph-volume inventory reports)

size: '10G' # - Includes disks of an exact size.

size: '10G:40G' # - Includes disks which size is within the range.

size: ':10G' # - Includes disks less than or equal to 10G in size.

size: '40G:' # - Includes disks equal to or greater than 40G in size.

# Sizes don't have to be exclusively in Gigabyte(G).
# Supported units are Megabyte(M), Gigabyte(G) and Terrabyte(T). Also appending the (B) for byte is supported. MB, GB, TB

Equality Matcher:

# is the drive rotating or not (SSDs and NVMEs don't rotate)
rotational: 0

The drive_spec for data_devices may also simply be all instead of a yaml structure. This should offer a convenient method to deploy a node while using all available drives to deploy standalone OSDs.

All Matcher:

# This is exclusive for the data_devices section.
all: true

Limiter:

When you specified valid filters but want to limit the amount of matching disks you can use the 'limit' directive.

# if this is present, limit the number of matching drives to this number. 
limit: 10

This new structure is proposed to serve as a declarative way to specify OSD deployments. On a per host basis OSD deployments are defined by the list of devices and their intended use (data, wal, db or journal) and a list of flags for the deployment tools (ceph-volume in this case). The Drive Group specification (dg) is intended to be created manually by a user and specifies a group of OSDs that are interrelated (hybrid OSDs that are deployed on solid state and spinners) or share the same deployment options (identical, i.e. same objectstore, same encryption option, ... standalone OSDs) To avoid explicitly listing devices, we'll rely on a list of filter items. These correspond to a few selected fields of ceph-volume inventory reports. In the simplest case this could be the rotational flag (all solid-state drives are to be db_devices, all rotating one data devices) or something more involved like model strings, sizes or others. DeepSea will provide code that translates these drive groups into actual device lists for inspection by the user.

Example Drive Group Files

2 Nodes with the same setup:

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 2 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB

This is a common setup and can be described quite easily:

The simple case

drive_group_default:
  target: '*'
  data_devices:
    model: SSD-123-foo
  db_devices:
    model: MC-55-44-XZ

This is a simple and valid, but maybe not future-safe configuration. The user may add disks of different vendors in the future, which wouldn't be included with this configuration

We can improve it by reducing the filters on core properties of the drives:

drive_group_default:
  target: '*'
  data_devices:
    rotational: 1
  db_devices:
    rotational: 0

Now, we enforce all rotating devices to be declared as 'data devices' and all non-rotating devices will be used as shared_devices (wal, db)

If you know that drives with more than 2TB will always be the slower data devices, you can also filter by size:

drive_group_default:
  target: '*'
  data_devices:
    size: '2TB:'
  db_devices:
    size: ':2TB'

Forcing encryption on your OSDs is as simple as appending 'encrypted: True' to the layout(? need to agree on a terminology, probably layout is bad - specification or spec?).

drive_group_default:
  target: '*'
  data_devices:
    size: '2TB:'
  db_devices:
    size: ':2TB'
  encrypted: True

This was a rather simple setup. Following this approach you can also describe more sophisticated setups.

The advanced case

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 12 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB
  • 2 NVMEs
    • Vendor: Samsung
    • Model: NVME-QQQQ-987
    • Size: 256GB

Here we have two distinct setups;

20 HDDs should share 2 SSDs;

10 SSDs should share 2 NVMes;

This can be described with two layouts.

drive_group:
  target: '*'
  data_devices:
    rotational: 0
  db_devices:
    model: MC-55-44-XZ
  db_slots: 5 # How many OSDs per DB device

Settings db_slots: 5 will ensure that only two SSDs will be used ( 10 left )

followed by

drive_group_default:
  target: '*'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    vendor: samsung
    size: 256GB
  db_slots: 5 # How many OSDs per DB device

The advanced case (with non-uniform nodes)

The examples above assumed that all nodes have the same drives. That's however not always the case. Example:

Node1-5:

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 2 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB

Node6-10:

  • 5 NVMEs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 20 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB

You can use the 'target' key in the layout to target certain nodes. Salt target notation helps to keep things easy.

drive_group_node_one_to_five:
  target: 'node[1-5]'
  data_devices:
    rotational: 1
  db_devices:
    rotational: 0

followed by:

drive_group_the_rest:
  target: 'node[6-10]'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    model: SSD-123-foo

The expert case

All previous cases co-colacated the WALs with the DBs. It's however possible to deploy the WAL on a dedicated device as well(if it makes sense).

  • 20 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 2 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB
  • 2 NVMEs
    • Vendor: Samsung
    • Model: NVME-QQQQ-987
    • Size: 256GB
drive_group_default:
  target: '*'
  data_devices:
    model: MC-55-44-XZ
  db_devices:
    model: SSD-123-foo
  wal_devices:
    model: NVME-QQQQ-987
  db_slots: 10
  wal_slots: 10

The very unlikely(but possible) case

Neither Ceph, Deepsea or ceph-volume prevents you from making questionable decisions.

  • 23 HDDs
    • Vendor: Intel
    • Model: SSD-123-foo
    • Size: 4TB
  • 10 SSDs
    • Vendor: Micron
    • Model: MC-55-44-ZX
    • Size: 512GB
  • 1 NVMEs
    • Vendor: Samsung
    • Model: NVME-QQQQ-987
    • Size: 256GB

Here we are trying to define:

20 HDDs backed by 1 NVME

2 HDDs backed by 1 SSD(db) and 1 NVME(wal)

8 SSDs backed by 1 NVME

2 SSDs standalone (encrypted)

1 HDD is spare and should not be deployed

drive_group_hdd_nvme:
  target: '*'
  data_devices:
    rotational: 0
  db_devices:
    model: NVME-QQQQ-987
  db_slots: 20
drive_group_hdd_ssd_nvme:
  target: '*'
  data_devices:
    rotational: 0
  db_devices:
    model: MC-55-44-XZ
  wal_devices:
    model: NVME-QQQQ-987
  db_slots: 2
  wal_slots: 2
drive_group_ssd_nvme:
  target: '*'
  data_devices:
    model: SSD-123-foo
  db_devices:
    model: NVME-QQQQ-987
  db_slots: 8
drive_group_ssd_standalone_encrypted:
  target: '*'
  data_devices:
    model: SSD-123-foo
  encryption: True

One HDD will remain as the file is being parsed from top to bottom and the db_slots(former ratios) are strictly defined.