gdbprof分析ceph的4K-randwrite性能

gdbprof，是一款基于系统时间（而非cpu事件）的性能分析工具，它通过sampling的方式进行性能统计的。每隔period（默认为0.1秒），给gdb发送sigint信号，然后循环每个thread，根据该thread的call trace，统计函数的调用情况。它是基于python实现的，中间用到了gdb的python api。
gdbprof源码本文采用的gdbprof优化版本本人forked版本，随时欢迎。

实验环境

搭建ceph-osd集群（本文用了3个osd.0、osd.1、osd.2）
服务端：server_host（osd）
客户端：client_host（mons、mgrs）
服务端需安装gdb，客户端需安装fio（都自行解决），下载gdbprof

实验过程

（1）简单查看下集群是否正常：

$ceph osd tree

$ceph -s

（2）查看所有osd对应进程及进程号

$ps aux|grep ceph-osd

实验一求解耗系统资源的线程（单image和10-images）

客户端（server_host）操作：
（1）fio 创建pool（或默认rbd）和image，并进行后续操作
$ rbd create –pool ymg –image img01 –size 40G
（2）先进行填充实验
$fio -direct=1 -iodepth=256 -ioengine=rbd -pool=rbd -rbdname=img01 -rw=write -bs=1M -size=40G -ramp_time=5 -group_reporting -name=full-fill
（3）再进行4k-randWrite的单image操作
$ fio -direct=1 -iodepth=256 -ioengine=rbd -pool=rbd -rbdname=img01 -rw=randwrite -bs=4K -runtime=300 -numjobs=1 -ramp_time=5 -group_reporting -name=one_gdbprof
（4）与此同时，服务端机器(client_host)需要并行执行top命令
$top -p 7922 -H
也即追踪进程7922(随意选一个)在过程中的所有线程对资源消耗情况，结果如下所示：

对比实验：10个images的4k-randWrite的实验操作：
$ BS=4k RW=randwrite fio images.fio
得先创建10个images
$rbd create ymg/img00 -s 40G
images.fio内容如下所示：

  [global]
  direct=1
  ioengine=rbd
  pool=ymg
  bs=${BS}
  size=40G
  time_based
  runtime=500
  ramp_time=5
  iodepth=256
  rw=${RW}
  group_reporting

  [00]
  rbdname=img00
  [01]
  rbdname=img01
  [02]
  rbdname=img02
  [03]
  rbdname=img03
  [04]
  rbdname=img04
  [05]
  rbdname=img05
  [06]
  rbdname=img06
  [07]
  rbdname=img07
  [08]
  rbdname=img08
  [09]
  rbdname=img09

服务端执行 $top -p 7922 -H
结果如下所示

总结
由上面的一对儿实验可知：4k-randwrite过程中，比较消耗资源的线程有：msgr-worker-0(7926)/msgr-worker-1(7927)、bstore_kv_sync(7972)、rocksdb::bg0(21792)、finisher(7971)、bstore_kv_final(7973)、log(7924)、tp_osd_tp(8109)
接下来，我们需要具体看这些线程中具体哪些函数最消耗时间。这就是gdbprof的亮相了。

实验二求解耗系统资源的具体函数（单image）

在客户端进行4k-randWrite操作同时（fio命令），服务端要跑gdbprof工具（可下面参考原文中的.docx文件），切换到gdbprof目录，有gdbprof.py
$vim gdbprof.py
line140，调节采样运行时间（无需太短或长），据说单位为秒，但感觉是采样次数，也就是样本个数，本人设过500，1000，2000，5000(等待时间太长)，如下所示：

$sudo gdb -ex ‘set pagination off’ -ex ‘attach 7922’ -ex ‘source gdbprof.py’ -ex ‘profile begin’ -ex ‘quit’
注：在运行过程中，可能会出现一些下面的debuginfo信息
$yum install ceph-debuginfo.x86_64 –enablerepo didi_deph

运行结束后就可以看到打印出来的结果，共有69个线程的信息
先找出打印出的线程函数信息与上面线程的对应关系：
$gdb
(gdb) attach 7922
(gdb) info threads

接着，贴出6个较为耗资源的线程的结果：Thread: 2===21792===rokcsdb::bg0、Thread: 28===8109===tp_osd_tp、Thread: 51===7973===bstore_kv_final、Thread: 52===7972===bstore_kv_sync、Thread: 53===7971===finisher、Thread: 68===7926===msgr_worker_0，可以看出哪些函数较为消耗资源，由于结果较长，部分如下所示：

Thread: 2===21792===rokcsdb::bg0

+ 100.00% clone
  + 100.00% start_thread
    + 100.00% None
      + 100.00% rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper
        + 100.00% rocksdb::ThreadPoolImpl::Impl::BGThread
          + 69.00% rocksdb::DBImpl::BackgroundCallCompaction
          | + 69.00% rocksdb::DBImpl::BackgroundCompaction
          |   + 69.00% rocksdb::CompactionJob::Run
          |     + 69.00% rocksdb::CompactionJob::ProcessKeyValueCompaction
          |       + 56.80% rocksdb::CompactionIterator::Next
          |       | + 42.20% rocksdb::MergingIterator::Next
          |       | | + 42.00% Next
          |       | | | + 25.00% b::(anonymous namespace)::TwoLevelIterator::Next
          |       | | | | + 25.00% Next
          |       | | | |   + 24.40% b::(anonymous namespace)::TwoLevelIterator::SkipEmptyDataBlocksForward
          |       | | | |   | + 24.40% b::(anonymous namespace)::TwoLevelIterator::InitDataBlock
          |       | | | |   |   + 24.20% rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator
          |       | | | |   |   | + 24.20% rocksdb::BlockBasedTable::NewDataBlockIterator
          |       | | | |   |   |   + 23.60% b::(anonymous namespace)::ReadBlockFromFile
          |       | | | |   |   |   | + 23.40% rocksdb::ReadBlockContents
          |       | | | |   |   |   | | + 23.40% ReadBlock
          |       | | | |   |   |   | |   + 22.40% rocksdb::RandomAccessFileReader::Read
          |       | | | |   |   |   | |   | + 22.40% b::(anonymous namespace)::ReadaheadRandomAccessFile::Read
          |       | | | |   |   |   | |   |   + 22.40% ReadIntoBuffer
          |       | | | |   |   |   | |   |     + 22.40% BlueRocksRandomAccessFile::Read
          |       | | | |   |   |   | |   |       + 22.40% read_random
          |       | | | |   |   |   | |   |         + 22.40% BlueFS::_read_random
          |       | | | |   |   |   | |   |           + 22.40% KernelDevice::read_random
          |       | | | |   |   |   | |   |             + 22.40% pread
          |       | | | |   |   |   | |   |               + 22.40% pread64
          |       | | | |   |   |   | |   + 1.00% Value
          |       | | | |   |   |   | |     + 1.00% b::crc32c::ExtendImpl<rocksdb::crc32c::Fast_CRC32>
          |       | | | |   |   |   | |       + 1.00% Fast_CRC32
          |       | | | |   |   |   | |         + 1.00% Slow_CRC32
          |       | | | |   |   |   | + 0.20% rocksdb::Block::Block(rocksdb::BlockContents&&, unsigned long, unsigned long, rocksdb::Statistics*)
          |       | | | |   |   |   |   + 0.20% BlockContents
          |       | | | |   |   |   |     + 0.20% operator=
          |       | | | |   |   |   + 0.40% rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache
          |       | | | |   |   |     + 0.40% rocksdb::BlockBasedTable::GetDataBlockFromCache
          |       | | | |   |   |       + 0.40% b::(anonymous namespace)::GetEntryFromCache
          |       | | | |   |   |         + 0.40% rocksdb::ShardedCache::Lookup
          |       | | | |   |   |           + 0.20% rocksdb::LRUCache::GetShard
          |       | | | |   |   + 0.20% b::(anonymous namespace)::TwoLevelIterator::SetSecondLevelIterator
          |       | | | |   + 0.40% rocksdb::BlockIter::ParseNextKey
          |       | | | |   | + 0.20% TrimAppend
          |       | | | |   | | + 0.20% EnlargeBufferIfNeeded
          |       | | | |   | |   + 0.20% tc_newarray
          |       | | | |   | + 0.20% DecodeEntry
          |       | | | |   + 0.20% b::(anonymous namespace)::TwoLevelIterator::Next
          |       | | | |     + 0.20% Next
          |       | | | + 17.00% b::(anonymous namespace)::TwoLevelIterator::SkipEmptyDataBlocksForward
          |       | | |   + 17.00% b::(anonymous namespace)::TwoLevelIterator::InitDataBlock
          |       | | |     + 17.00% rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator
          |       | | |       + 17.00% rocksdb::BlockBasedTable::NewDataBlockIterator
          |       | | |         + 16.80% b::(anonymous namespace)::ReadBlockFromFile
          |       | | |           + 16.80% rocksdb::ReadBlockContents
          |       | | |             + 16.80% ReadBlock
          |       | | |               + 16.60% rocksdb::RandomAccessFileReader::Read
          |       | | |               | + 16.60% b::(anonymous namespace)::ReadaheadRandomAccessFile::Read
          |       | | |               |   + 16.40% ReadIntoBuffer
          |       | | |               |   | + 16.40% BlueRocksRandomAccessFile::Read
          |       | | |               |   |   + 16.40% read_random
          |       | | |               |   |     + 16.40% BlueFS::_read_random
          |       | | |               |   |       + 16.40% KernelDevice::read_random
          |       | | |               |   |         + 16.40% pread
          |       | | |               |   |           + 16.40% pread64
          |       | | |               |   + 0.20% TryReadFromCache
          |       | | |               |     + 0.20% memcpy
          |       | | |               |       + 0.20% __memcpy_ssse3_back
          |       | | |               + 0.20% Value
          |       | | |                 + 0.20% b::crc32c::ExtendImpl<rocksdb::crc32c::Fast_CRC32>
          |       | | |                   + 0.20% Fast_CRC32
          |       | | |                     + 0.20% Slow_CRC32
          |       | | + 0.20% replace_top
          |       | |   + 0.20% downheap
          |       | |     + 0.20% operator()
          |       | + 14.60% rocksdb::CompactionIterator::NextFromInput
          |       |   + 13.20% rocksdb::MergingIterator::Next
          |       |   | + 12.80% Next
          |       |   | | + 9.40% b::(anonymous namespace)::TwoLevelIterator::Next
          |       |   | | | + 9.40% Next
          |       |   | | |   + 9.20% b::(anonymous namespace)::TwoLevelIterator::SkipEmptyDataBlocksForward
          |       |   | | |   | + 9.20% b::(anonymous namespace)::TwoLevelIterator::InitDataBlock
        ......
        ......
        ......

Thread: 28===8109===tp_osd_tp
......
......
......

具体文件可下载由上面结果可以知道6个线程中哪些函数较为消耗资源，如下所示：
（1）Thread: 2===21792===rokcsdb::bg0：