首页 > 其他分享 >rte_malloc 分析

rte_malloc 分析

时间:2022-12-10 13:03:07浏览次数:40  
标签:rte 分析 malloc socket elem heap size

heap初始化

对应的代码实现可参考rte_eal_malloc_heap_init函数

  heap结构主要是通过malloc_heap进行封装的,针对每个numa节点都会构建这样一个数据结构,在heap初始化阶段主要是完成malloc_elem的预分配处理,如图所示每个malloc_elem区间是有可能包含多个rte_memseg的,因此其物理空间同样可能是不连续的。为了便于查找临近的malloc_elem做区间合并(前提:需要在同一个 rte_memseg_list且虚拟空间连续),malloc_elem是采用双向链表结构来进行关联的。

基于heap做malloc申请

  对应的代码实现可参考rte_malloc函数

  与内核处理方式相类似,为了避免分配过程中外部碎片的产生,主要是基于伙伴分配算法来对heap空间做分配管理,heap内部共声明了13个free_list(存储256B ~ 1GB区间的数据,4倍递增),相同容量的malloc_elem会以链表形式进行关联。malloc申请操作需要在malloc_heap#lock锁内进行,申请到的空间会做cache_line对齐处理(参考 RTE_CACHE_LINE_ROUNDUP),申请过程主要通过heap_alloc函数来进行,大致处理流程如下:
(1)首先基于free_list查找空间合适的malloc_elem(参考find_suitable_element)
(2)然后基于malloc_elem做空间分配处理(参考malloc_elem_alloc),并将剩余空间划分成新的malloc_elem,加入到标签合适的free_list

  当heap初始分配的物理空间不够用的时候,DPDK还会进行动态的扩容处理(参考alloc_more_mem_on_socket),但是新扩容的rte_memseg并不含有RTE_MEMSEG_FLAG_DO_NOT_FREE标签,预示着使用结束后会释放其物理内存资源(参考malloc_elem_hide_region)

  在已知malloc_elem虚拟地址的情况下,还可通过rte_malloc_virt2iova函数计算出其对应的物理地址。

 

heap上做free释放

对应的代码实现可参考rte_free函数,释放过程同样需要在malloc_heap#lock锁内进行,首先通过addr(rte_free方法参数)定位到目标malloc_elem,然后通过malloc_elem_free函数对其进行释放处理,释放过程中会将其与周边空闲的malloc_elem进行合并。

基于rte_malloc做内存申请需要在锁内进行,因此并不高效,另外如果每次申请的内存空间较小还有可能产生内部碎片问题。对此,DPDK提供了rte_mempool内存池来优化这方面的使用(对应内核的SLAB处理方案)

基于heap构建mempool

对应的代码实现可参考rte_mempool_create函数

rte_malloc 分析_代码实现

由图片可以看到,rte_mempool是采用无锁队列(rte_ring)来对缓存资源(obj)进行组织的,同时在CPU层面还提供了thread_local粒度的缓存,从而大大降低锁抢占带来的开销,部分实现细节如下:
(1)rte_mempool的存储空间由3个以上rte_memzone组成,分别对应rte_ring、cpu_local粒度的缓存以及全局缓存(可能包含多个rte_memzone)
(2)缓存空间的申请主要借助rte_mempool_populate_default函数,rte_memzone会按照物理空间的连续性来做memoryChunk划分,以此来避免obj出现存储空间不连续的情况。
(3)chunk内部,每个缓存元素会含有header(rte_mempool_objhdr)和tailer信息,header之间也是链表关联的,可通过rte_mempool#elt_list做全局的遍历检索。

其他注意事项:
(1)rte_mempool所缓存的元素会做相应的对齐处理(参考rte_mempool_calc_obj_size)
(2)rte_mempool是绑定NUMA的,跨NUMA访问会牺牲一定的性能。
(3)cpu_local缓存采用堆栈式管理,从后往前分配,从前往后释放。

 

 

 

/*
* Allocate memory on default heap.
*/
void *
rte_malloc(const char *type, size_t size, unsigned align)
{
return rte_malloc_socket(type, size, align, SOCKET_ID_ANY);
}
/*
* Allocate memory on specified heap.
*/
void *
rte_malloc_socket(const char *type, size_t size, unsigned int align,
int socket_arg)
{
return malloc_socket(type, size, align, socket_arg, true);
}

 

static void *
malloc_socket(const char *type, size_t size, unsigned int align,
int socket_arg, const bool trace_ena)
{
void *ptr;

/* return NULL if size is 0 or alignment is not power-of-2
如果申请的内存size为0或者align对齐参数为0或者align不是2的整数倍,直接返回NULL*/
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;

/* if there are no hugepages and if we are not allocating from an
* external heap, use memory from any socket available. checking for
* socket being external may return -1 in case of invalid socket, but
* that's OK - if there are no hugepages, it doesn't matter.
/如果 internal_config.no_hugetlbfs =1即程序在初始化的时候已经设置了不存在hugepage
//则将socket_arg设为SOCKET_ID_ANY 同时 不允许从其他的heap上分配内存
*/
if (rte_malloc_heap_socket_is_external(socket_arg) != 1 &&
!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;//则将socket_arg设为SOCKET_ID_ANY
//调用malloc_heap_alloc开始处理内存申请
ptr = malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);

if (trace_ena)
rte_eal_trace_mem_malloc(type, size, align, socket_arg, ptr);
return ptr;
}

 

 

static void *
malloc_socket(const char *type, size_t size, unsigned int align,
int socket_arg, const bool trace_ena)
{
void *ptr;

/* return NULL if size is 0 or alignment is not power-of-2
如果申请的内存size为0或者align对齐参数为0或者align不是2的整数倍,直接返回NULL*/
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;

/* if there are no hugepages and if we are not allocating from an
* external heap, use memory from any socket available. checking for
* socket being external may return -1 in case of invalid socket, but
* that's OK - if there are no hugepages, it doesn't matter.
/如果 internal_config.no_hugetlbfs =1即程序在初始化的时候已经设置了不存在hugepage
//则将socket_arg设为SOCKET_ID_ANY 同时 不允许从其他的heap上分配内存
*/
if (rte_malloc_heap_socket_is_external(socket_arg) != 1 &&
!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;//则将socket_arg设为SOCKET_ID_ANY
//调用malloc_heap_alloc开始处理内存申请
ptr = malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);

if (trace_ena)
rte_eal_trace_mem_malloc(type, size, align, socket_arg, ptr);
return ptr;
}

 

 

void *
malloc_heap_alloc(const char *type, size_t size, int socket_arg,
unsigned int flags, size_t align, size_t bound, bool contig)
{
int socket, heap_id, i;
void *ret;

/* return NULL if size is 0 or alignment is not power-of-2
如果申请的内存size为0或者align对齐参数为0或者align不是2的整数倍,直接返回NULL*/
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;

if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
socket_arg = SOCKET_ID_ANY;

if (socket_arg == SOCKET_ID_ANY)
socket = malloc_get_numa_socket();
else
socket = socket_arg;

/* turn socket ID into heap ID */
heap_id = malloc_socket_to_heap_id(socket);
/* if heap id is negative, socket ID was invalid */
if (heap_id < 0)
return NULL;
//开始处理内存申请
ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
bound, contig);
if (ret != NULL || socket_arg != SOCKET_ID_ANY)
return ret;

/* try other heaps. we are only iterating through native DPDK sockets,
* so external heaps won't be included.如果在指定的socket上或者在当前运行程序的lcore对应的socket上未能成功申请到内存
则尝试其他的socket
*/
for (i = 0; i < (int) rte_socket_count(); i++) {
if (i == heap_id)
continue;
ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
bound, contig);
if (ret != NULL)
return ret;
}
return NULL;
}

 

 

/* this will try lower page sizes first */
static void *
malloc_heap_alloc_on_heap_id(const char *type, size_t size,
unsigned int heap_id, unsigned int flags, size_t align,
size_t bound, bool contig)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
int socket_id;
void *ret;
const struct internal_config *internal_conf =
eal_get_internal_configuration();

rte_spinlock_lock(&(heap->lock));

align = align == 0 ? 1 : align;

/* for legacy mode, try once and with all flags */
if (internal_conf->legacy_mem) {
ret = heap_alloc(heap, type, size, flags, align, bound, contig);
goto alloc_unlock;
}

/*
* we do not pass the size hint here, because even if allocation fails,
* we may still be able to allocate memory from appropriate page sizes,
* we just need to request more memory first.
*/

socket_id = rte_socket_id_by_idx(heap_id);
/*
* if socket ID is negative, we cannot find a socket ID for this heap -
* which means it's an external heap. those can have unexpected page
* sizes, so if the user asked to allocate from there - assume user
* knows what they're doing, and allow allocating from there with any
* page size flags.
*/
if (socket_id < 0)
size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;

ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
if (ret != NULL)
goto alloc_unlock;

/* if socket ID is invalid, this is an external heap */
if (socket_id < 0)
goto alloc_unlock;

if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
bound, contig)) {
ret = heap_alloc(heap, type, size, flags, align, bound, contig);

/* this should have succeeded */
if (ret == NULL)
RTE_LOG(ERR, EAL, "Error allocating from heap\n");
}
alloc_unlock:
rte_spinlock_unlock(&(heap->lock));
return ret;
}

 

 

/*
* Main function to allocate a block of memory from the heap.
* It locks the free list, scans it, and adds a new memseg if the
* scan fails. Once the new memseg is added, it re-scans and should return
* the new element after releasing the lock.
*/
static void *
heap_alloc(struct malloc_heap *heap, const char *type __rte_unused, size_t size,
unsigned int flags, size_t align, size_t bound, bool contig)
{
struct malloc_elem *elem;

size = RTE_CACHE_LINE_ROUNDUP(size);
align = RTE_CACHE_LINE_ROUNDUP(align);
//size参数和align参数做对齐操作,以RTE_CACHE_LINE_SIZE的大小做对齐

/* roundup might cause an overflow */
if (size == 0)
return NULL;
//从堆中找到一个合适的元素用来作为malloc所申请的内存块
elem = find_suitable_element(heap, size, flags, align, bound, contig);
if (elem != NULL) {
//将内存划分出去
elem = malloc_elem_alloc(elem, size, align, bound, contig);

/* increase heap's count of allocated elements */
heap->alloc_count++;
}
//这里需要十分注意,return的指针指向的是data部分的头指针,即这块内存块向后偏移了1个elem结构体长度
//&elem[1]即elem的指针向后偏移 sizeof(elem)长度。

return elem == NULL ? NULL : (void *)(&elem[1]);
}

 

 

void
rte_free(void *addr)
{
return mem_free(addr, true);
}

 

 

/* Free the memory space back to heap */
static void
mem_free(void *addr, const bool trace_ena)
{
if (trace_ena)
rte_eal_trace_mem_free(addr);

if (addr == NULL) return;
if (malloc_heap_free(malloc_elem_from_data(addr)) < 0)
RTE_LOG(ERR, EAL, "Error: Invalid memory\n");
}

 

 

/*
* Given a pointer to the start of a memory block returned by malloc, get
* the actual malloc_elem header for that block.
*/
static inline struct malloc_elem *
malloc_elem_from_data(const void *data)
{
if (data == NULL)
return NULL;

struct malloc_elem *elem = RTE_PTR_SUB(data, MALLOC_ELEM_HEADER_LEN);
if (!malloc_elem_cookies_ok(elem))
return NULL;
return elem->state != ELEM_PAD ? elem: RTE_PTR_SUB(elem, elem->pad);
}

如果传入的指针不为空,则找到这块内存块的真正起始地址(通过malloc_elem_from_data(),将addr向前偏移一个elem结构体长度),然后调用malloc_elem_free()处理。

释放的过程其实主要就是做了内存块free链表的操作,将新释放的内存块加入到free链表中。加入的策略如下:

  • 如果此被释放的内存块的next指针指向的内存块为free,则将它和后一个内存块合并,并将后一个内存块从free链表中去除;
  • 如果此被释放的内存块的pre指针指向的内存块为free,则将它和前一个内存块亦进行合并;
  • 将合并后(或不满足合并条件则不合并)的内存块,插入到对应free链表的头部。这里的对应值得是根据此内存块的大小进行匹配的,

 

http代理服务器(3-4-7层代理)-网络事件库公共组件、内核kernel驱动 摄像头驱动 tcpip网络协议栈、netfilter、bridge 好像看过!!!! 但行好事 莫问前程 --身高体重180的胖子



标签:rte,分析,malloc,socket,elem,heap,size
From: https://blog.51cto.com/u_15404950/5927488

相关文章