03 块存储

标签：03 存储 -- demo ceph 19 rbd pool

RDB 块存储介绍
- Ceph Block Device
About Pools（资源池）
创建 pools
RDB 创建和映射
RDB 块存储扩容
Ceph 数据写入流程
Ceph RDB 告警排查

RDB 块存储介绍

Ceph Block Device

* Thin-provisioned （受分配，使用多少分配多少，慢慢扩大）
* Images up to 16 exabytes （单个镜像最大16EB）
* Configurable striping（配置切片）
* In-memory caching（）
* Snapshots（支持快照）
* Copy-on-write cloning（快照克隆）
* Kernel driver support（内核支持）
* KVM/libvirt support（kvm/librirt支持）
* Back-end for cloud solutions（后端支持云解决方案）
* Incremental backup（增量备份）
* Disaster recovery (multisite asynchronous replication)（灾难恢复）

About Pools（资源池）

The Ceph storage system supports the notion of ‘Pools’, which are logical
partitions for storing objects.

Ceph存储系统支持池的概念，这是用于存储对象的逻辑分区

Ceph Clients retrieve a Cluster Map from a Ceph Monitor, and write objects to
pools. The pool’s size or number of replicas, the CRUSH rule and the
number of placement groups determine how Ceph will place the data.

Ceph客户端从Ceph监视器中检索Cluster Map，并将对象写入池中。池的大小或副本的数量、CRUSH规则和放置组的数量决定了Ceph将如何放置数据。

Pools set at least the following parameters:

Pools 至少设置以下参数

Ownership/Access to Objects
The Number of Placement Groups, and
The CRUSH Rule to Use.

See Set Pool Values for details.

创建 pools

创建 pool 池

# 查看 pool 池信息
$ ceph osd lspools

# 创建 pool 池
$ ceph osd pool create
Invalid command: missing required parameter pool(<poolname>)
osd pool create <poolname> <int[0-]> {<int[0-]>} {replicated|erasure} {<erasure_code_profile>} {<rule>} {<int>} {<int>} {<int[0-]>} {<int[0-]>} {<float[0.0-1.0]>} :  create pool
Error EINVAL: invalid command
$ ceph osd pool create ceph-demo 64 64 
pool 'ceph-demo' created

# 查看 pool 池信息
$ ceph osd lspools
1 ceph-demo

查看 pool 池信息

# 查看 pool 池 pg_num 信息
$ ceph osd pool get ceph-demo pg_num
pg_num: 64

# 查看 pool 池 pgp_num 信息
$ ceph osd pool get ceph-demo pgp_num
pgp_num: 64

# 查看 pool 池 size 信息
$ ceph osd pool get ceph-demo size
size: 3

# 查看 pool 池 crush_rule 信息
$ ceph osd pool get ceph-demo crush_rule
crush_rule: replicated_rule

# 查看帮助信息
$ ceph osd pool get -h

 General usage: 
 ==============
usage: ceph [-h] [-c CEPHCONF] [-i INPUT_FILE] [-o OUTPUT_FILE]
            [--setuser SETUSER] [--setgroup SETGROUP] [--id CLIENT_ID]
            [--name CLIENT_NAME] [--cluster CLUSTER]
            [--admin-daemon ADMIN_SOCKET] [-s] [-w] [--watch-debug]
            [--watch-info] [--watch-sec] [--watch-warn] [--watch-error]
            [--watch-channel {cluster,audit,*}] [--version] [--verbose]
            [--concise] [-f {json,json-pretty,xml,xml-pretty,plain}]
            [--connect-timeout CLUSTER_TIMEOUT] [--block] [--period PERIOD]

Ceph administration tool

optional arguments:
  -h, --help            request mon help
  -c CEPHCONF, --conf CEPHCONF
                        ceph configuration file
  -i INPUT_FILE, --in-file INPUT_FILE
                        input file, or "-" for stdin
  -o OUTPUT_FILE, --out-file OUTPUT_FILE
                        output file, or "-" for stdout
  --setuser SETUSER     set user file permission
  --setgroup SETGROUP   set group file permission
  --id CLIENT_ID, --user CLIENT_ID
                        client id for authentication
  --name CLIENT_NAME, -n CLIENT_NAME
                        client name for authentication
  --cluster CLUSTER     cluster name
  --admin-daemon ADMIN_SOCKET
                        submit admin-socket commands ("help" for help
  -s, --status          show cluster status
  -w, --watch           watch live cluster changes
  --watch-debug         watch debug events
  --watch-info          watch info events
  --watch-sec           watch security events
  --watch-warn          watch warn events
  --watch-error         watch error events
  --watch-channel {cluster,audit,*}
                        which log channel to follow when using -w/--watch. One
                        of ['cluster', 'audit', '*']
  --version, -v         display version
  --verbose             make verbose
  --concise             make less verbose
  -f {json,json-pretty,xml,xml-pretty,plain}, --format {json,json-pretty,xml,xml-pretty,plain}
  --connect-timeout CLUSTER_TIMEOUT
                        set a timeout for connecting to the cluster
  --block               block until completion (scrub and deep-scrub only)
  --period PERIOD, -p PERIOD
                        polling period, default 1.0 second (for polling
                        commands only)

 Local commands: 
 ===============

ping <mon.id>           Send simple presence/life test to a mon
                        <mon.id> may be 'mon.*' for all mons
daemon {type.id|path} <cmd>
                        Same as --admin-daemon, but auto-find admin socket
daemonperf {type.id | path} [stat-pats] [priority] [<interval>] [<count>]
daemonperf {type.id | path} list|ls [stat-pats] [priority]
                        Get selected perf stats from daemon/admin socket
                        Optional shell-glob comma-delim match string stat-pats
                        Optional selection priority (can abbreviate name):
                         critical, interesting, useful, noninteresting, debug
                        List shows a table of all available stats
                        Run <count> times (default forever),
                         once per <interval> seconds (default 1)
    

 Monitor commands: 
 =================
osd pool get <poolname> size|min_size|pg_num|pgp_num|crush_rule|hashpspool|         get pool parameter <var>
 nodelete|nopgchange|nosizechange|write_fadvise_dontneed|noscrub|nodeep-scrub|hit_  
 set_type|hit_set_period|hit_set_count|hit_set_fpp|use_gmt_hitset|target_max_       
 objects|target_max_bytes|cache_target_dirty_ratio|cache_target_dirty_high_ratio|   
 cache_target_full_ratio|cache_min_flush_age|cache_min_evict_age|erasure_code_      
 profile|min_read_recency_for_promote|all|min_write_recency_for_promote|fast_read|  
 hit_set_grade_decay_rate|hit_set_search_last_n|scrub_min_interval|scrub_max_       
 interval|deep_scrub_interval|recovery_priority|recovery_op_priority|scrub_         
 priority|compression_mode|compression_algorithm|compression_required_ratio|        
 compression_max_blob_size|compression_min_blob_size|csum_type|csum_min_block|csum_ 
 max_block|allow_ec_overwrites|fingerprint_algorithm|pg_autoscale_mode|pg_          
 autoscale_bias|pg_num_min|target_size_bytes|target_size_ratio                      
osd pool get-quota <poolname>                                                       obtain object or byte limits for pool

修改 pool 池默认信息

# 修改 pool 池 size 信息
$ ceph osd pool set ceph-demo size 2
set pool 1 size to 2
$ ceph osd pool get ceph-demo size
size: 2

# 修改 pool 池 pg_num pgp_num 信息
$ ceph -s
  cluster:
    id:     97702c43-6cc2-4ef8-bdb5-855cfa90a260
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum node0,node1,node2 (age 6d)
    mgr: node0(active, since 6d), standbys: node1, node2
    osd: 3 osds: 3 up (since 6d), 3 in (since 6d)
 
  task status:
 
  data:
    pools:   1 pools, 64 pgs        # 此处信息发生变化
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 147 GiB / 150 GiB avail
    pgs:     64 active+clean
 
$ ceph osd pool set ceph-demo pg_num 128
set pool 1 pg_num to 128
$ ceph osd pool set ceph-demo pgp_num 128
set pool 1 pgp_num to 128

$ ceph -s
  cluster:
    id:     97702c43-6cc2-4ef8-bdb5-855cfa90a260
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum node0,node1,node2 (age 6d)
    mgr: node0(active, since 6d), standbys: node1, node2
    osd: 3 osds: 3 up (since 6d), 3 in (since 6d)
 
  data:
    pools:   1 pools, 128 pgs        # 此处信息发生变化
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 147 GiB / 150 GiB avail
    pgs:     128 active+clean

RDB 创建和映射

RDB 创建

# 查看 RDB 信息
$ rbd -p ceph-demo ls

# RDB 帮助信息
$ rbd help create
usage: rbd create [--pool <pool>] [--namespace <namespace>] [--image <image>] 
                  [--image-format <image-format>] [--new-format] 
                  [--order <order>] [--object-size <object-size>] 
                  [--image-feature <image-feature>] [--image-shared] 
                  [--stripe-unit <stripe-unit>] 
                  [--stripe-count <stripe-count>] [--data-pool <data-pool>] 
                  [--journal-splay-width <journal-splay-width>] 
                  [--journal-object-size <journal-object-size>] 
                  [--journal-pool <journal-pool>] 
                  [--thick-provision] --size <size> [--no-progress] 
                  <image-spec> 

Create an empty image.

Positional arguments
  <image-spec>              image specification
                            (example: [<pool-name>/[<namespace>/]]<image-name>)

Optional arguments
  -p [ --pool ] arg         pool name
  --namespace arg           namespace name
  --image arg               image name
  --image-format arg        image format [1 (deprecated) or 2]
  --new-format              use image format 2
                            (deprecated)
  --order arg               object order [12 <= order <= 25]
  --object-size arg         object size in B/K/M [4K <= object size <= 32M]
  --image-feature arg       image features
                            [layering(+), exclusive-lock(+*), object-map(+*),
                            deep-flatten(+-), journaling(*)]
  --image-shared            shared image
  --stripe-unit arg         stripe unit in B/K/M
  --stripe-count arg        stripe count
  --data-pool arg           data pool
  --journal-splay-width arg number of active journal objects
  --journal-object-size arg size of journal objects [4K <= size <= 64M]
  --journal-pool arg        pool for journal objects
  --thick-provision         fully allocate storage and zero image
  -s [ --size ] arg         image size (in M/G/T) [default: M]
  --no-progress             disable progress output

Image Features:
  (*) supports enabling/disabling on existing images
  (-) supports disabling-only on existing images
  (+) enabled by default for new images if features not specified

# RDB 的2种创建方式
$ rbd create -p ceph-demo --image rdb-demo.img --size 10G
$ rbd create ceph-demo/rdb-demo1.img --size 10G

# 查看新建的 RDB
$ rbd -p ceph-demo ls
rdb-demo.img
rdb-demo1.img

查看 RDB 信息

$ rbd info ceph-demo/rdb-demo.img
rbd image 'rdb-demo.img':
        size 10 GiB in 2560 objects     # rbd 10G大小， 2560 objects
        order 22 (4 MiB objects)        # 每个 objects 大小 4M
        snapshot_count: 0
        id: 11a95e44ddd2                # id
        block_name_prefix: rbd_data.11a95e44ddd2    # object file 的前缀
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten     # features 内核中的高级特性，挂载时可能会有问题，只保留 layering 其他去掉
        op_features: 
        flags: 
        create_timestamp: Wed Oct 19 19:36:17 2022
        access_timestamp: Wed Oct 19 19:36:17 2022
        modify_timestamp: Wed Oct 19 19:36:17 2022

删除 RDB

$ rbd -p ceph-demo ls
rdb-demo.img
rdb-demo1.img

$ rbd rm -p ceph-demo --image rdb-demo1.img 
Removing image: 100% complete...done.

$ rbd -p ceph-demo ls
rdb-demo.img

挂载 RDB

直接挂载回报错，有些 features 内核不支持

$ rbd map ceph-demo/rdb-demo.img
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the kernel with "rbd feature disable ceph-demo/rdb-demo.img object-map fast-diff deep-flatten".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address
$ rbd info ceph-demo/rdb-demo.img
rbd image 'rdb-demo.img':
        size 10 GiB in 2560 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 11a95e44ddd2
        block_name_prefix: rbd_data.11a95e44ddd2
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
        op_features: 
        flags: 
        create_timestamp: Wed Oct 19 19:36:17 2022
        access_timestamp: Wed Oct 19 19:36:17 2022
        modify_timestamp: Wed Oct 19 19:36:17 2022

去除 features

# 获取 feature 命令帮助
$ rbd -h | grep fea
    feature disable                   Disable the specified image feature.
    feature enable                    Enable the specified image feature.

# 去除 feature
$ rbd feature disable ceph-demo/rdb-demo.img deep-flatten
$ rbd feature disable ceph-demo/rdb-demo.img fast-diff
$ rbd feature disable ceph-demo/rdb-demo.img object-map
rbd: failed to update image features: 2022-10-19 19:49:35.576 7f6e43cf9c80 -1 librbd::Operations: one or more requested features are already disabled
(22) Invalid argument
$ rbd feature disable ceph-demo/rdb-demo.img exclusive-lock

# 查看去除 feature 后的信息
$ rbd info ceph-demo/rdb-demo.img
rbd image 'rdb-demo.img':
        size 10 GiB in 2560 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 11a95e44ddd2
        block_name_prefix: rbd_data.11a95e44ddd2
        format: 2
        features: layering
        op_features: 
        flags: 
        create_timestamp: Wed Oct 19 19:36:17 2022
        access_timestamp: Wed Oct 19 19:36:17 2022
        modify_timestamp: Wed Oct 19 19:36:17 2022

挂载 RDB

# 挂载 RDB
$ rbd map ceph-demo/rdb-demo.img
/dev/rbd0

# 查看 块设备列表
$ rbd device list
id pool      namespace image        snap device    
0  ceph-demo           rdb-demo.img -    /dev/rbd0

# 格式化块设备
$ mkfs.ext4 /dev/rbd0
mke2fs 1.42.9 (28-Dec-2013)
Discarding device blocks: done                            
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=1024 blocks, Stripe width=1024 blocks
655360 inodes, 2621440 blocks
131072 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2151677952
80 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

# 挂载 块设备
$ mount /dev/rbd0 /mnt/

# 查看块设备 和 df
$ ls /mnt/
lost+found
$ df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 898M     0  898M   0% /dev
tmpfs                    910M     0  910M   0% /dev/shm
tmpfs                    910M   18M  893M   2% /run
tmpfs                    910M     0  910M   0% /sys/fs/cgroup
/dev/mapper/centos-root   37G  2.4G   35G   7% /
/dev/sda1               1014M  151M  864M  15% /boot
tmpfs                    910M   52K  910M   1% /var/lib/ceph/osd/ceph-0
tmpfs                    182M     0  182M   0% /run/user/0
/dev/rbd0                9.8G   37M  9.2G   1% /mnt

# 写入文件信息，为后续磁盘动态扩容做验证
[root@node0 mnt]# !echo
echo test > test
[root@node0 mnt]# ls
lost+found  test

RDB 块存储扩容

# 查看块设备大小，当前为 10G
$ rbd info ceph-demo/rdb-demo.img
rbd image 'rdb-demo.img':
        size 10 GiB in 2560 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 11a95e44ddd2
        block_name_prefix: rbd_data.11a95e44ddd2
        format: 2
        features: layering
        op_features: 
        flags: 
        create_timestamp: Wed Oct 19 19:36:17 2022
        access_timestamp: Wed Oct 19 19:36:17 2022
        modify_timestamp: Wed Oct 19 19:36:17 2022

# 获取 resize 扩容命令的帮助信息
$ rbd help resize
usage: rbd resize [--pool <pool>] [--namespace <namespace>] 
                  [--image <image>] --size <size> [--allow-shrink] 
                  [--no-progress] 
                  <image-spec> 

Resize (expand or shrink) image.

Positional arguments
  <image-spec>         image specification
                       (example: [<pool-name>/[<namespace>/]]<image-name>)

Optional arguments
  -p [ --pool ] arg    pool name
  --namespace arg      namespace name
  --image arg          image name
  -s [ --size ] arg    image size (in M/G/T) [default: M]
  --allow-shrink       permit shrinking
  --no-progress        disable progress output

# 扩容块设备
$ rbd resize ceph-demo/rdb-demo.img --size 20G
Resizing image: 100% complete...done.

# 重新查看块设置，查看块大小是否扩容为 20G
$ rbd info ceph-demo/rdb-demo.img
rbd image 'rdb-demo.img':
        size 20 GiB in 5120 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 11a95e44ddd2
        block_name_prefix: rbd_data.11a95e44ddd2
        format: 2
        features: layering
        op_features: 
        flags: 
        create_timestamp: Wed Oct 19 19:36:17 2022
        access_timestamp: Wed Oct 19 19:36:17 2022
        modify_timestamp: Wed Oct 19 19:36:17 2022

# 查看文件系统大小是否扩容
$ df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 898M     0  898M   0% /dev
tmpfs                    910M     0  910M   0% /dev/shm
tmpfs                    910M   18M  893M   2% /run
tmpfs                    910M     0  910M   0% /sys/fs/cgroup
/dev/mapper/centos-root   37G  2.4G   35G   7% /
/dev/sda1               1014M  151M  864M  15% /boot
tmpfs                    910M   52K  910M   1% /var/lib/ceph/osd/ceph-0
tmpfs                    182M     0  182M   0% /run/user/0
/dev/rbd0                9.8G   37M  9.2G   1% /mnt

# 扩容文件系统
$ resize2fs /dev/rbd0
resize2fs 1.42.9 (28-Dec-2013)
Filesystem at /dev/rbd0 is mounted on /mnt; on-line resizing required
old_desc_blocks = 2, new_desc_blocks = 3
The filesystem on /dev/rbd0 is now 5242880 blocks long.

# 查看文件系统大小是否扩容
$ df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 898M     0  898M   0% /dev
tmpfs                    910M     0  910M   0% /dev/shm
tmpfs                    910M   18M  893M   2% /run
tmpfs                    910M     0  910M   0% /sys/fs/cgroup
/dev/mapper/centos-root   37G  2.4G   35G   7% /
/dev/sda1               1014M  151M  864M  15% /boot
tmpfs                    910M   52K  910M   1% /var/lib/ceph/osd/ceph-0
tmpfs                    182M     0  182M   0% /run/user/0
/dev/rbd0                 20G   44M   19G   1% /mnt

# 查看扩容后，原块设备文件数据是否 OK
$ cd /mnt/
$ ls
lost+found  test
$ cat test 
test

Ceph 数据写入流程

# 查看 RDB 信息
$ rbd info ceph-demo/rdb-demo.img
rbd image 'rdb-demo.img':
        size 20 GiB in 5120 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 11a95e44ddd2
        block_name_prefix: rbd_data.11a95e44ddd2
        format: 2
        features: layering
        op_features: 
        flags: 
        create_timestamp: Wed Oct 19 19:36:17 2022
        access_timestamp: Wed Oct 19 19:36:17 2022
        modify_timestamp: Wed Oct 19 19:36:17 2022

# 看到 RDB 对应的 object
$ rados -p ceph-demo ls | grep rbd_data.11a95e44ddd2
rbd_data.11a95e44ddd2.0000000000000e03
rbd_data.11a95e44ddd2.0000000000000800
rbd_data.11a95e44ddd2.0000000000000427
rbd_data.11a95e44ddd2.0000000000000e00
rbd_data.11a95e44ddd2.0000000000000434
rbd_data.11a95e44ddd2.0000000000000a20
rbd_data.11a95e44ddd2.0000000000000021
......

$ rados -p ceph-demo stat rbd_data.11a95e44ddd2.0000000000000e03
ceph-demo/rbd_data.11a95e44ddd2.0000000000000e03 mtime 2022-10-19 20:03:39.000000, size 4194304

# 看到 object 所存放的 pg 和 osd
$ ceph osd map ceph-demo rbd_data.11a95e44ddd2.0000000000000e03
osdmap e27 pool 'ceph-demo' (1) object 'rbd_data.11a95e44ddd2.0000000000000e03' -> pg 1.aefd8300 (1.0) -> up ([1,2], p1) acting ([1,2], p1)

$ ceph osd tree
ID CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF 
-1       0.14639 root default                           
-3       0.04880     host node0                         
 0   hdd 0.04880         osd.0      up  1.00000 1.00000 
-5       0.04880     host node1                         
 1   hdd 0.04880         osd.1      up  1.00000 1.00000 
-7       0.04880     host node2                         
 2   hdd 0.04880         osd.2      up  1.00000 1.00000

Ceph RDB 告警排查

# 查看 Ceph 集群信息
$ ceph -s
  cluster:
    id:     97702c43-6cc2-4ef8-bdb5-855cfa90a260
    health: HEALTH_WARN
            application not enabled on 1 pool(s)
 
  services:
    mon: 3 daemons, quorum node0,node1,node2 (age 6d)
    mgr: node0(active, since 6d), standbys: node1, node2
    osd: 3 osds: 3 up (since 6d), 3 in (since 6d)
 
  data:
    pools:   1 pools, 128 pgs
    objects: 103 objects, 305 MiB
    usage:   3.6 GiB used, 146 GiB / 150 GiB avail
    pgs:     128 active+clean

# 获取集群健康信息
$ ceph health detail
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
    application not enabled on pool 'ceph-demo'
    use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.

# 查看当前 RDB 的 application 定义
$ ceph osd pool application get ceph-demo
{}

# 设置 ceph-demo application 为 rbd，设置 资源池的的类型，方便管理
$ ceph osd pool application enable ceph-demo rbd
enabled application 'rbd' on pool 'ceph-demo'
$ ceph osd pool application get ceph-demo
{
    "rbd": {}
}

# 查看 Ceph 集群信息
$ ceph -s
  cluster:
    id:     97702c43-6cc2-4ef8-bdb5-855cfa90a260
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum node0,node1,node2 (age 6d)
    mgr: node0(active, since 6d), standbys: node1, node2
    osd: 3 osds: 3 up (since 6d), 3 in (since 6d)
 
  data:
    pools:   1 pools, 128 pgs
    objects: 103 objects, 305 MiB
    usage:   3.6 GiB used, 146 GiB / 150 GiB avail
    pgs:     128 active+clean

标签：03,存储,--,demo,ceph,19,rbd,pool
From： https://www.cnblogs.com/evescn/p/16813788.html