内存调试
目标
- 获取进程内存分配的调用栈,内存占比火焰图;
- 获得真实的in use内存数据,即不包含tcmalloc/ptmalloc的缓存;
原理
google tcmalloc替换glibc ptmalloc,在api中加代码桩。
实践
依赖项
- 火焰图:https://github.com/brendangregg/FlameGraph.git
- gperftools:https://github.com/gperftools/gperftools.git,本文中gperftools的安装操作如下:
1 # gperftools 安装路径 2 gperf_install_base_path="/var/.gperftools/release" 3 4 # 编译安装 5 cd gperftools && ./autogen.sh && ./configure --prefix=${gperf_install_base_path}/ && make all -j2 && sudo make install
测试代码
将如下代码命名为 t_gperf_tools.cc
1 #include <stdio.h> 2 #include <stdlib.h> 3 #include <malloc.h> 4 #include <unistd.h> 5 #include <vector> 6 #include <map> 7 #include <iostream> 8 #include <thread> 9 10 #define GPERFTOOLS_EN (1) 11 #if GPERFTOOLS_EN == 1 12 #include "gperftools/heap-profiler.h" 13 #include "gperftools/malloc_extension.h" 14 #endif 15 16 void MallocTestC() { 17 uint64_t *a; 18 while (1) { 19 a = (uint64_t *)calloc(1024 * 1024, sizeof(uint64_t)); 20 sleep(1); 21 printf("Leak size %zu MB for %p\n", sizeof(uint64_t), a); 22 } 23 } 24 25 void MallocTestCPP() { 26 static std::vector<std::map<uint64_t, std::shared_ptr<std::vector<uint64_t>>>> 27 map_vec; 28 while (1) { 29 if (map_vec.size() >= 40) { 30 malloc_stats(); 31 map_vec.clear(); 32 map_vec.shrink_to_fit(); 33 // MallocExtension::instance()->ReleaseFreeMemory(); 34 malloc_stats(); 35 } 36 37 std::map<uint64_t, std::shared_ptr<std::vector<uint64_t>>> map; 38 for (size_t i = 0U; i < 200; i++) { 39 std::shared_ptr<std::vector<uint64_t>> vec = 40 std::make_shared<std::vector<uint64_t>>(); 41 // vec->reserve(1024U); 42 vec->resize(1024U); 43 map.emplace(i, vec); 44 } 45 map_vec.emplace_back(map); 46 usleep(200000); 47 } 48 } 49 50 int main(int argc, char **argv) { 51 (void)argc; 52 (void)argv; 53 54 #if GPERFTOOLS_EN == 1 55 HeapProfilerStart("/tmp/t_gperf_tools_O0"); 56 #endif 57 58 std::thread thr{MallocTestCPP}; 59 MallocTestC(); 60 thr.join(); 61 62 #if GPERFTOOLS_EN == 1 63 HeapProfilerStop(); 64 #endif 65 return 0; 66 }
内存火焰图
为了看到更完美的调用栈,我们采用-O0编译
1 # 编译 2 g++ t_gperf_tools.c -O0 -g -o t_gperf_tools -lpthread -ltcmalloc -L/var/.gperftools/release/lib -I/var/.gperftools/release/include 3 4 # 运行,以1秒为间隔输出HeapProfiler文件 5 LD_LIBRARY_PATH=/var/.gperftools/release/lib HEAP_PROFILE_TIME_INTERVAL="1" ./t_gperf_tools
运行后,能看到输出的 HeapProfiler 文件
1 Starting tracking the heap 2 Dumping heap profile to /tmp/t_gperf_tools_O0.0001.heap (1667187833 sec since the last dump) 3 Dumping heap profile to /tmp/t_gperf_tools_O0.0002.heap (1 sec since the last dump) 4 Leak size 8 MB for 0x562deff26000 5 Dumping heap profile to /tmp/t_gperf_tools_O0.0003.heap (1 sec since the last dump) 6 Leak size 8 MB for 0x562df0f1c000 7 Dumping heap profile to /tmp/t_gperf_tools_O0.0004.heap (1 sec since the last dump) 8 Leak size 8 MB for 0x562df1f0a000 9 Dumping heap profile to /tmp/t_gperf_tools_O0.0005.heap (1 sec since the last dump) 10 Leak size 8 MB for 0x562df2ef4000 11 Dumping heap profile to /tmp/t_gperf_tools_O0.0006.heap (1 sec since the last dump) 12 Leak size 8 MB for 0x562df3d50000 13 Dumping heap profile to /tmp/t_gperf_tools_O0.0007.heap (1 sec since the last dump) 14 Leak size 8 MB for 0x562df4d3c000 15 Dumping heap profile to /tmp/t_gperf_tools_O0.0008.heap (1 sec since the last dump) 16 Leak size 8 MB for 0x562df5d26000 17 Dumping heap profile to /tmp/t_gperf_tools_O0.0009.heap (1 sec since the last dump) 18 Leak size 8 MB for 0x562df6b80000
生成火焰图
1 # 解析 HeapProfiler 文件 2 /var/.gperftools/release/bin/pprof --collapsed ./t_gperf_tools /tmp/t_gperf_tools_O0.0009.heap > gperf.stacks 3 4 # 生成火焰图 5 cat gperf.stacks | /home/user/disk/prjs/perf/mdc_perf/perf/FlameGraph/flamegraph.pl --color=mem --title="malloc() Flame Graph" --countname="calls" > gperf.svg 6 7 # 打开火焰图 8 google-chrome gperf.svg
火焰图如下,对比代码,可清晰看到内存的消耗位置
进程实际in_use内存量
malloc_stats
应用程序申请内存,常规流程是经由glibc再到kernel的syscall,而glibc的内存管理为了减少申请/释放内存的系统调用,会为做一层 内存 的 缓存。
无论是glibc默认的ptmalloc,还是google的tcmalloc,都提供了“malloc_stats”API,用于获取当前进程总的内存占用量,以及实际的使用量,两者相减即为 此进程在glibc的缓存大小。
此处以tcmalloc为例,对应代码中的30、34两行,两行的输出对比可知,即使做了shrink_to_fit,应用释放内存也只是回到了page heap freelist。
# 30行 malloc_stats 的打印 ------------------------------------------------ MALLOC: 142024744 ( 135.4 MiB) Bytes in use by application MALLOC: + 442368 ( 0.4 MiB) Bytes in page heap freelist MALLOC: + 120760 ( 0.1 MiB) Bytes in central cache freelist MALLOC: + 0 ( 0.0 MiB) Bytes in transfer cache freelist MALLOC: + 18464 ( 0.0 MiB) Bytes in thread cache freelists MALLOC: + 2752512 ( 2.6 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 145358848 ( 138.6 MiB) Actual memory used (physical + swap) MALLOC: + 0 ( 0.0 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 145358848 ( 138.6 MiB) Virtual address space used MALLOC: MALLOC: 4139 Spans in use MALLOC: 3 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. # 34行 malloc_stats 的打印 ------------------------------------------------ MALLOC: 75589672 ( 72.1 MiB) Bytes in use by application MALLOC: + 60833792 ( 58.0 MiB) Bytes in page heap freelist MALLOC: + 1231224 ( 1.2 MiB) Bytes in central cache freelist MALLOC: + 1277952 ( 1.2 MiB) Bytes in transfer cache freelist MALLOC: + 3673696 ( 3.5 MiB) Bytes in thread cache freelists MALLOC: + 2752512 ( 2.6 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 145358848 ( 138.6 MiB) Actual memory used (physical + swap) MALLOC: + 0 ( 0.0 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 145358848 ( 138.6 MiB) Virtual address space used MALLOC: MALLOC: 596 Spans in use MALLOC: 3 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------
MallocExtension::instance()->ReleaseFreeMemory()
当应用想要将 glibc 的缓存 手动还给 内核时,tcmalloc 提供了 ReleaseFreeMemory 的API。