PostgreSQL内存管理-内存上下文

标签：set PostgreSQL chunk 内存 context 上下文 block size

PostgreSQL 8.4.1 内存管理

内存管理.png

共享内存中存储着所有进程的公共数据，例如锁变量、进程通信状态、缓冲区等。而本地内存为每个后台进程所专有，是它们的工作区域，存储了该进程的Cache(高速缓存)、事务管理信息、进程信息等。

为了防止多个进程并发访问共享内存中数据时产生冲突，PostgreSOL 提供了轻量级锁，用于支持对共享内存中同一数据的互斥访问。PostgreSOL使用共享内存实现了 IPC (进程间通信)以及无效消息共享。此外，存储管理器还提供内存上下文 (MemoryContext) 用于统一管理内存的分配和回收，从而更加有效安全地对内存空间进行管理。

内存上下文

系统中的内存分配操作在各种语义的内存上下文中进行，所有在内存上下文中分配的内存空间都通过内存上下文进行记录。

PostgreSQL 的**【每一个子进程都拥有多个私有的内存上下文】，每个子进程的内存上下文组成一个树形结构**，其根节点为TopMemoryContext。
在根节点之下有多个子节点，每个子节点都用于不同的功能模块，例如CacheMemoryContext 用于管理Cache、ErrorMemoryContext 用于错误处理，每个子节点又可以有自己的子节点。

内存上下文树.png

通过树形结构可以跟踪进程中内存上下文的创建和使用情况：当创建一个新的内存上下文时，将其添到某个已存在的内存上下文下面作为其子节点。在清除内存时从根节点开始遍历内存上下文树可以将其所有节点占用的内存完全释放。

表示内存上下文节点信息的数据结构：

typedef struct MemoryContextData
{
	NodeTag		type;			/*内存节点类型 identifies exact kind of context */
	MemoryContextMethods *methods;		/*内存处理函数指针 virtual function table */
	MemoryContext parent;		/*父节点指针 NULL if no parent (toplevel context) */
	MemoryContext firstchild;	/*第一个子节点 head of linked list of children */
	MemoryContext nextchild;	/*第一个兄弟节点 next child of same parent */
	char	   *name;			/*节点名称 context name (just for debugging) */
} MemoryContextData;

methods字段的数据结构：

typedef struct MemoryContextMethods //由一系列函数指针组成的集合
{
	void	   *(*alloc) (MemoryContext context, Size size);//分配内存
	/* call this free_p in case someone #define's free() */
	void		(*free_p) (MemoryContext context, void *pointer);//释放内存
	void	   *(*realloc) (MemoryContext context, void *pointer, Size size);//重分配内存
	void		(*init) (MemoryContext context);//初始化内存上下文
	void		(*reset) (MemoryContext context);//重置内存上下文
	void		(*delete) (MemoryContext context);//删除内存上下文
	Size		(*get_chunk_space) (MemoryContext context, void *pointer);//检查内存片段的大小
	bool		(*is_empty) (MemoryContext context);//检查内存上下文是否为空
	void		(*stats) (MemoryContext context, int level);//打印内存上下文状态
#ifdef MEMORY_CONTEXT_CHECKING
	void		(*check) (MemoryContext context);//检查所有内存片段
#endif
} MemoryContextMethods;

MemoryContext

在任何时候，都有一个“当前”的MemoryContext，记录在全局变量CurrentMemoryContext里，进程就在这个内存上下文中调用palloc函数来分配内存。在需要变换内存上下文时，可以使用MemoryContextSwitchTo函数将CurrentMemoryContext指向其他的内存上下文。
MemoryContext 是一个抽象类，可以有多个实现，但目前只有AllocSetContext 一个实现，每当创建新的MemoryContext时，会将methods字段置为AllocSetMethods。。
事实上，MemoryContext 并不管理实际上的内存分配，仅仅是用作对MemoryContext树的控制。管理一个内存上下文中的内存块是通过AllocSet结构来完成的，而MemoryContext仅作为AllocSet的头部信息存在，AllocSet是一个指向AllocSetContext结构的类型指针。

AllocSetContext 数据结构：

typedef struct AllocSetContext 
{
	MemoryContextData header;	/*对应于该内存上下文的头部信息 Standard memory-context fields */
	/* Info about storage allocated in this context: */
	AllocBlock	blocks;			/*该内存上下文中所有内存块的链表 head of list of blocks in this set */
	AllocChunk	freelist[ALLOCSET_NUM_FREELISTS];/*该内存上下文中空闲内存片的数组 free chunk lists */
	bool		isReset;		/*若为真，表示从上次重置之后就没分配过内存 T = no space alloced since last reset */
	/* Allocation parameters for this context: */
	Size		initBlockSize;	/*初始内存块的大小 initial block size */
	Size		maxBlockSize;	/*最大内存块大小 maximum block size */
	Size		nextBlockSize;	/*下一个要分配的内存块的大小 next block size to allocate */
	Size		allocChunkLimit;/*分配内存片的尺寸阈值，该值在分配内存片时会用到 effective chunk size limit */
	AllocBlock	keeper;			/*保存在keeper中的内存块在内存上下文重置时会被保留而不被重置 if not NULL, keep this block over resets */
} AllocSetContext;

typedef AllocSetContext *AllocSet;//AllocSet 管理一个内存上下文中的内存块

AllocSet 结构：

AllocSet结构.png

字段说明：

isReset

PostgreSQL 中提供了对内存上下文的重置操作，所谓重置就是释放内存上下文中所有分配的内存，并将这些内存交还给操作系统。在一个内存上下文被创建时，其 isReset字段置为True，表示从上一次重置到当前没有内存被分配。只要在该内存上下文中进行了分配，则将其 isReset字段置为False。这样在进行重置时，可以检查内存上下文的isReset字段，如果为True则表示该内存上下文中没有进行过内存分配，所以为true就不需要进行实际的重置工作，从而提高操作效率。
initBlockSize、maxBlockSize、nextBlockSize
- initBlockSize和 maxBlockSize字段在内存上下文创建时指定，且在创建时nextBlockSize 会置为与initBlockSize 相同的值。
- nextBlockSize表示下一次分配的内存块的大小，在进行内存分配时，如果需要分配一个新的内存块，则这个新内存块的大小将采用nextBlockSize 的值。
  
  有些情况下需要将下一次要分配的内存块的大小置为上一次的2倍，所以nextBlockSize是会变大的。当对内存上下文进行重置时，需要将nextBlockSize恢复到初始值，也就是initBlockSize，所以initBlockSize充当了内存块初始大小的备份值。虽然nextBlockSize可以增大，但也并不能无限制地增加，maxBlockSize字段指定了内存块可以到达的最大尺寸。
allocChunkLimit

内存块内会分成多个称为内存片 memory chunk的内存单元，在分配内存片时，如果一个内存片的尺寸超过了宏ALLOC_CHUNK_LIMIT时，将会为该chunk单独分配一个独立的内存块，这样做是为了避免日后进行内存回收时造成过多的碎片。

由于宏ALLOC_CHUNK_LIMIT是不能在运行时更改的，因此PostgreSQL提供了allocChunkLimit用于自定义一个阈值。如果定义了这个字段的值，则在进行超限检查时会用该字段来替换宏定义进行判断。
keeper

在内存上下文进行重置时不会对keeper中记录的内存块进行释放，而是对其内容进行清空。
- 这样可以保证内存上下文重置结束后就已经包含一定的可用内存空间，而不需要通过malloc另行申请。
- 另外也可以避免在某个内存上下文被反复重置时，反复进行malloc带来的风险。例如，在执行查询时，每一个元组被获得时，都需要有一个内存上下文用于存放与之相关的信息，而当获取下一个元组时，该内存上下文将被重置后重复使用。通过keeper字段保留一个内存块，可以避免每次重置后都进行malloc操作。
freelist

该数组用于维护在内存块中被回收的空闲内存片，这些空闲内存片将被用于再分配。FreeList数组元素类型为AllocChunk，数组长度默认为11（由宏ALLOCSET_NUM_PREELISTS定义)
- FreeList数组中的每一个元素指向一个由特定“大小”空闲内存片组成的链表，这个“大小”与该元素在数组中的顺序有关。比如，FreeList数组中第K个元素所指向链表的每个空闲数据块的大小为2^(k +2)字节，空闲内存片最小为8字节，最大不超过8K字节。因此，FreeList数组中实际上维护了11个空闲链表。管理着11种“大小”的空闲内存片。从 AllocChunkData的结构我们可以发现，并没有明显的字段用来将内存片链接到空闲链表。
- 其实aset字段具有两种用途：
  - 如果一个内存片正在使用，则它的aset字段指向其所属的 AllocSet。
  - 如果内存片是空闲的，也就是说它处于某个空闲链表中，那么它的aset字段指向空闲链表中在它之后的内存片。（充当next指针）
  这样从FreeList数组元素所指向的链表头部开始，顺着aset字段指向的下一个内存片就可以找到该空闲链表中所有的空闲内存片。（遍历）
- 需要注意的是，FreeList 中所有的内存片的大小都为2的指数。当需要申请一块内存时，我们可以迅速地定位到相应的空闲空间链表中。
  
  对于一个大小为size的内存分配请求，将会在第K个空闲链表中为其分配内存片，K的计算规则是:
  - 当size < ( 1 << ALLOC_MINBITS）时，K为0，其中ALLOC_MINBITS系统预定义为3，即表示FreeList中保存的最小字节数为2^3=8字节。
  - 当(1 << ALLOC_MINBITS)<size<ALLOC_CHUNK_LIMIT时，若2^(N-1)< size <2^N，则K=N-3，其中ALLOC_CHUNK_LIMIT 为FreeList 数组中所能维持空闲内存片的最大值。

AllocBlockData 数据结构：

typedef struct AllocBlockData // 记录一块内存区域的起始地址，由malloc函数分配，称为一个内存块
{
	AllocSet	aset;			/*该内存块所在的AllocSet aset that owns this block */
	AllocBlock	next;			/*指向下一个内存块的指针 next block in aset's blocks list */
	char	   *freeptr;		/*指向该块空闲区域的首地址 start of free space in this block */
	char	   *endptr;			/*该内存块的末地址 end of space in this block */
} AllocBlockData;
typedef struct AllocBlockData *AllocBlock;		/* forward reference */

AllocChunkData 数据结构：

typedef struct AllocChunkData
{
	/* aset is the owning aset if allocated, or the freelist link if free */
	void	   *aset;// 该内存片所在的AllocSet，若chunk为空闲，则用于链接其他空闲链表
	/* size is always the size of the usable space in the chunk */
	Size		size;//内存片的实际大小，由于内存片都是以2的幂为大小进行对齐，因此申请的大小可能比实际大小要小
#ifdef MEMORY_CONTEXT_CHECKING
	/* when debugging memory usage, also store actual requested size */
	/* this is zero in a free chunk */
	Size		requested_size;//内存片中被使用的空间大小，如果是空闲内存片则置为0
#endif
} AllocChunkData;
typedef struct AllocChunkData *AllocChunk;

初始化与创建

任何一个PostgreSQL进程在使用内存上下文之前，都需要先进行初始化。

内存上下文的初始化工作由函数MemoryContextInit来完成。

在初始化时首先创建所有内存上下文的***根节点TopMemoryContext***，然后在该节点下创建***子节点ErrorContext***用于错误恢复处理。

/*
 * MemoryContextInit
 *		Start up the memory-context subsystem.
 *
 * This must be called before creating contexts or allocating memory in
 * contexts.  TopMemoryContext and ErrorContext are initialized here;
 * other contexts must be created afterwards.
 *
 * In normal multi-backend operation, this is called once during
 * postmaster startup, and not at all by individual backend startup
 * (since the backends inherit an already-initialized context subsystem
 * by virtue of being forked off the postmaster).
 * 初始化只在postmaster进程启动时执行一次，backend子进程会在这个总上下文创建自己的节点
 * In a standalone backend this must be called during backend startup.
 */
void
MemoryContextInit(void)
{
	AssertState(TopMemoryContext == NULL);

	/*
	 * Initialize TopMemoryContext as an AllocSetContext with slow growth rate
	 * --- we don't really expect much to be allocated in it.
	 *
	 TopMemoryContext节点在分配后将一直存在，直到系统退出时候才释放。
	 在该节点下面分配其他内存上下文节点本身所占用的空间。
	 * (There is special-case code in MemoryContextCreate() for this call.)
	 */
	TopMemoryContext = AllocSetContextCreate((MemoryContext) NULL,
											 "TopMemoryContext",
											 0,
											 8 * 1024,
											 8 * 1024);

	/*
	 * Not having any other place to point CurrentMemoryContext, make it point
	 * to TopMemoryContext.  Caller should change this soon!
	 */
	CurrentMemoryContext = TopMemoryContext;	

	/*
	 * Initialize ErrorContext as an AllocSetContext with slow growth rate ---
	 * we don't really expect much to be allocated in it. More to the point,
	 * require it to contain at least 8K at all times. This is the only case
	 * where retained memory in a context is *essential* --- we want to be
	 * sure ErrorContext still has some memory even if we've run out
	 * elsewhere!
	 ErrorContext节点是TopMemoryContext的第一个子节点，是错误恢复处理的永久性内存上下文，恢复完毕就会进行重置。
	 */
	ErrorContext = AllocSetContextCreate(TopMemoryContext,
										 "ErrorContext",
										 8 * 1024,
										 8 * 1024,
										 8 * 1024);
}

当初始化完毕后，PostgreSQL进程就可以开始创建其他的内存上下文。

内存上下文的创建由AllocSetContextCreate函数来实现，主要有两个工作：创建内存上下文节点以及分配内存块。

内存上下文节点的创建由 MemoryContextCreate 函数来完成

/*
 * AllocSetContextCreate
 *		Create a new AllocSet context.
 *
 * parent: parent context, or NULL if top-level context
 * name: name of context (for debugging --- string will be copied)
 * minContextSize: minimum context size
 * initBlockSize: initial allocation block size
 * maxBlockSize: maximum allocation block size
 */
MemoryContext
AllocSetContextCreate(//parent:类型为MemoryContext，表示当前要创建的内存上下文的父节点。
    				  MemoryContext parent,
					  //name:字符串类型，是当前要创建的内存上下文的名称。
                      const char *name,
    				  //minContextSize、initBlockSize、maxBlockSize:都是size_t类型
					  Size minContextSize,// 内存上下文的最小尺寸
					  Size initBlockSize,// 初始内存块尺寸
					  Size maxBlockSize)// 最大内存块尺寸
{
	AllocSet	context;
	/* Do the type-independent part of context creation */
    // 1）进行内存节点的初始化，创建MemoryContext结构。
    // 2）将上一步创建的MemoryContext 结构强制转换为AlloeSet结构体
	context = (AllocSet) MemoryContextCreate(T_AllocSetContext,
											 sizeof(AllocSetContext),
											 &AllocSetMethods, // <--
											 parent,
											 name);

    // 填充AllocSet其他结构体元素的信息，包括最小块大小、初始化块大小以及最大块大小。
	/*
	 * Make sure alloc parameters are reasonable, and save them.
	 *
	 * We somewhat arbitrarily enforce a minimum 1K block size.
	 */
	initBlockSize = MAXALIGN(initBlockSize);
	if (initBlockSize < 1024)
		initBlockSize = 1024;
	maxBlockSize = MAXALIGN(maxBlockSize);
	if (maxBlockSize < initBlockSize)
		maxBlockSize = initBlockSize;
	context->initBlockSize = initBlockSize;
	context->maxBlockSize = maxBlockSize;
	context->nextBlockSize = initBlockSize;

	/*
	 * Compute the allocation chunk size limit for this context.  It can't be
	 * more than ALLOC_CHUNK_LIMIT because of the fixed number of freelists.
	 * If maxBlockSize is small then requests exceeding the maxBlockSize
	 * should be treated as large chunks, too.	We have to have
	 * allocChunkLimit a power of two, because the requested and
	 * actually-allocated sizes of any chunk must be on the same side of the
	 * limit, else we get confused about whether the chunk is "big".
	 */
    // 根据最大块大小设置其alloc-ChunkLimit的值。
	context->allocChunkLimit = ALLOC_CHUNK_LIMIT;
	while (context->allocChunkLimit >
		   (Size) (maxBlockSize - ALLOC_BLOCKHDRSZ - ALLOC_CHUNKHDRSZ))
		context->allocChunkLimit >>= 1;

	/*
	 * Grab always-allocated space, if requested
	 */
    // 3）如果minContextSize超过一定限制（内存块头部信息尺寸和内存片头部信息尺寸之和)时
	if (minContextSize > ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ)
	{
        // 调用标准库函数，以minContextSize为大小分配一个内存块
		Size		blksize = MAXALIGN(minContextSize);
		AllocBlock	block;

		block = (AllocBlock) malloc(blksize);
		if (block == NULL)
		{
			MemoryContextStats(TopMemoryContext);
			ereport(ERROR,
					(errcode(ERRCODE_OUT_OF_MEMORY),
					 errmsg("out of memory"),
					 errdetail("Failed while creating memory context \"%s\".",
							   name)));
		}
        // 初始化块结构体。
		block->aset = context;
		block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
		block->endptr = ((char *) block) + blksize;
        // 加入 AllocSet 的内存块链表中（头插法）
		block->next = context->blocks;
		context->blocks = block;
        // 需要注意的是，预分配的内存块将被记入到内存上下文的 keeper字段中，作为内存上下文的保留块，以便重置内存上下文的时候，该内存块不会被释放。
		/* Mark block as not to be released at reset time */
		context->keeper = block;
	}

	context->isReset = true;

	return (MemoryContext) context;
}

内存上下文节点初始化

/*--------------------
 * MemoryContextCreate
 *		Context-type-independent part of context creation.
 *
 * This is only intended to be called by context-type-specific
 * context creation routines, not by the unwashed masses.
 *
 * The context creation procedure is a little bit tricky because
 * we want to be sure that we don't leave the context tree invalid
 * in case of failure (such as insufficient memory to allocate the
 * context node itself).  The procedure goes like this:
 *	1.	Context-type-specific routine first calls MemoryContextCreate(),
 *		passing the appropriate tag/size/methods values (the methods
 *		pointer will ordinarily point to statically allocated data).
 *		The parent and name parameters usually come from the caller.
 *	2.	MemoryContextCreate() attempts to allocate the context node,
 *		plus space for the name.  If this fails we can ereport() with no
 *		damage done.
 *	3.	We fill in all of the type-independent MemoryContext fields.
 *	4.	We call the type-specific init routine (using the methods pointer).
 *		The init routine is required to make the node minimally valid
 *		with zero chance of failure --- it can't allocate more memory,
 *		for example.
 *	5.	Now we have a minimally valid node that can behave correctly
 *		when told to reset or delete itself.  We link the node to its
 *		parent (if any), making the node part of the context tree.
 *	6.	We return to the context-type-specific routine, which finishes
 *		up type-specific initialization.  This routine can now do things
 *		that might fail (like allocate more memory), so long as it's
 *		sure the node is left in a state that delete will handle.
 *
 * This protocol doesn't prevent us from leaking memory if step 6 fails
 * during creation of a top-level context, since there's no parent link
 * in that case.  However, if you run out of memory while you're building
 * a top-level context, you might as well go home anyway...
 *
 * Normally, the context node and the name are allocated from       <--<--
 * TopMemoryContext (NOT from the parent context, since the node must  
 * survive resets of its parent context!).	However, this routine is itself
 * used to create TopMemoryContext!  If we see that TopMemoryContext is NULL, <--<--
 * we assume we are creating TopMemoryContext and use malloc() to allocate
 * the node.
 *
 * Note that the name field of a MemoryContext does not point to
 * separately-allocated storage, so it should not be freed at context
 * deletion.
 *--------------------
 */
MemoryContext
MemoryContextCreate(NodeTag tag, Size size,
					MemoryContextMethods *methods,
					MemoryContext parent,
					const char *name)
{
    // 1)从TopMemoyContext节点中分配一块内存用于存放内存上下文节点（该内存上下文管理的内存块不包括在内)，该内存块略大于一个AllocSet结构体的大小。
	MemoryContext node;
	Size		needed = size + strlen(name) + 1;//这个1是终结符嘛？

	/* Get space for node and name */
	if (TopMemoryContext != NULL)
	{
		/* Normal case: allocate the node in TopMemoryContext */
		node = (MemoryContext) MemoryContextAlloc(TopMemoryContext,
												  needed);
	}
	else
	{
		/* Special case for startup: use good ol' malloc */
		node = (MemoryContext) malloc(needed);
		Assert(node != NULL);
	}
	//2）初始化 MemoryContext 节点，设置其父节点、节点类型、节点名等相关信息，还要设置该节点的methods属性为AllocSetMethods。
	/* Initialize the node as best we can */
	MemSet(node, 0, size);
	node->type = tag;
	node->methods = methods;
	node->parent = NULL;		/* for the moment */
	node->firstchild = NULL;
	node->nextchild = NULL;
	node->name = ((char *) node) + size;
	strcpy(node->name, name);

	/* Type-specific routine finishes any other essential initialization */
	(*node->methods->init) (node);

	/* OK to link node to parent (if any) */
	if (parent)
	{   // 头插法
		node->parent = parent;
		node->nextchild = parent->firstchild;
		parent->firstchild = node;
	}

	/* Return to type-specific creation routine to finish up */
	return node;
}

内存分配

完成内存上下文初始化和节点的初始化、创建之后，就可以进行内存的分配。

主要通过MemoryContextAlloc和MemoryContextAllocZero两个接口函数完成。

/*
 * MemoryContextAlloc
 *		Allocate space within the specified context.
 *
 * This could be turned into a macro, but we'd have to import
 * nodes/memnodes.h into postgres.h which seems a bad idea.
 */
void *
MemoryContextAlloc(MemoryContext context, Size size)
{
	AssertArg(MemoryContextIsValid(context));

	if (!AllocSizeIsValid(size))
		elog(ERROR, "invalid memory alloc request size %lu",
			 (unsigned long) size);

	return (*context->methods->alloc) (context, size);
}

/*
 * MemoryContextAllocZero
 *		Like MemoryContextAlloc, but clears allocated memory
 *
 *	We could just call MemoryContextAlloc then clear the memory, but this
 *	is a very common combination, so we provide the combined operation.
 */
void *
MemoryContextAllocZero(MemoryContext context, Size size)
{
	void	   *ret;

	AssertArg(MemoryContextIsValid(context));

	if (!AllocSizeIsValid(size))
		elog(ERROR, "invalid memory alloc request size %lu",
			 (unsigned long) size);

	ret = (*context->methods->alloc) (context, size);

	MemSetAligned(ret, 0, size);// 会将分配的内容块清零

	return ret;
}

在PostgreSQL中，内存的分配、重分配和释放都是在内存上下文中进行，因此不再使用C语言的标准库函数malloc、realloc和 free来操作，PostgreSQL实现了palloc、repalloc和 pfree来分别实现内存上下文中对于内存的分配、重分配和释放。

palloc是一个宏定义，它会被转换为在“当前”内存上下文中对MemoryContextAlloc函数的调用，而MemoryContextAlloc函数实际上是调用了“当前”内存上下文的methods字段中所指定的alloc 函数指针。对于目前PostgreSQL 的实现来说，调用palloc实际就是调用了alloc 指向的AllocSetAlloc函数。【使用palloc分配的内存空间中的内容是随机的】，与之相对应的还定义了一个宏palloc0，后者会将分配的内存中的内容全部置为0。
realloc是一个函数，其参数是一个内存片的指针和新的内存片大小。realloc将调用内存片所属的内存上下文的realloc函数指针，把传入的内存片调整为新的大小，并返回新内存片的指针。目前，realloc函数指针对应于AllocSetRealloc函数。
pfree是一个函数，其参数是一个内存片的指针，pfree将调用内存片所属的内存上下文的methods字段中的free_p函数指针来释放内存片的空间。目前，PostgreSQL 中free_p指针实际指向AllocSetFree函数。

AllocSetAlloc

/*
 * AllocSetAlloc
 *		Returns pointer to allocated memory of given size; memory is added
 *		to the set.
		函数AllocSetAlloc负责处理具体的内存分配工作
 */
static void *
AllocSetAlloc(MemoryContext context, Size size)// 参数为一个内存上下文节点以及需要申请的内存大小。
{
	AllocSet	set = (AllocSet) context;
	AllocBlock	block;
	AllocChunk	chunk;
	int			fidx;
	Size		chunk_size;
	Size		blksize;

	AssertArg(AllocSetIsValid(set));

	/*
	 * If requested size exceeds maximum for chunks, allocate an entire block
	 * for this request.
	 1）判断需要申请的内存大小是否超过了当前内存上下文中允许分配内存片的最大值（即内存上下文节点的allocChunkLimit字段)。
	 如申请的大小没有超过限制则执行步骤2。
	 */
	if (size > set->allocChunkLimit)
	{
        // 若超过，则为其分配一个新的独立的内存块。
		chunk_size = MAXALIGN(size);
		blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
		block = (AllocBlock) malloc(blksize);
		if (block == NULL)
		{
			MemoryContextStats(TopMemoryContext);
			ereport(ERROR,
					(errcode(ERRCODE_OUT_OF_MEMORY),
					 errmsg("out of memory"),
					 errdetail("Failed on request of size %lu.",
							   (unsigned long) size)));
		}
		block->aset = set;
		block->freeptr = block->endptr = ((char *) block) + blksize;
 		// 然后在该内存块中分配指定大小的内存片
		chunk = (AllocChunk) (((char *) block) + ALLOC_BLOCKHDRSZ);
		chunk->aset = set;
		chunk->size = chunk_size;
#ifdef MEMORY_CONTEXT_CHECKING
		chunk->requested_size = size;
		/* set mark to catch clobber of "unused" space */
		if (size < chunk_size)
			((char *) AllocChunkGetPointer(chunk))[size] = 0x7E;
#endif
#ifdef RANDOMIZE_ALLOCATED_MEMORY
		/* fill the allocated space with junk */
		randomize_mem((char *) AllocChunkGetPointer(chunk), size);
#endif

		/*
		 * Stick the new block underneath the active allocation block, so that
		 * we don't lose the use of the space remaining therein.
		 接下来将该内存块加入到内存块链表中
		 */
		if (set->blocks != NULL)
		{
			block->next = set->blocks->next;
			set->blocks->next = block;
		}
		else
		{
			block->next = NULL;
			set->blocks = block;
		}
		// 最后设置内存上下文的isReset字段为False并返回内存片的指针。
		set->isReset = false;

		AllocAllocInfo(set, chunk);
		return AllocChunkGetPointer(chunk);
	}

	/*
	 * Request is small enough to be treated as a chunk.  Look in the
	 * corresponding free list to see if there is a free chunk we could reuse.
	 * If one is found, remove it from the free list, make it again a member
	 * of the alloc set and return its data address.
	 2）计算申请的内存大小在FreeList中对应的位置，如果空闲链表中没有满足要求的内存片则执行步骤3。
	 */
	fidx = AllocSetFreeIndex(size);
	chunk = set->freelist[fidx];
	if (chunk != NULL)
	{
        // 如果存在合适的空闲内存片，则将空闲链表的指针（Freelist数组的某个元素）指向该内存片的aset字段所指向的地址（在空闲内存片中，aset字段指向它在空闲链表中的下一个内存片)。
		Assert(chunk->size >= size);

		set->freelist[fidx] = (AllocChunk) chunk->aset;

		chunk->aset = (void *) set;// 然后将该内存片的aset字段指向其所属的内存上下文节点

#ifdef MEMORY_CONTEXT_CHECKING
		chunk->requested_size = size;
		/* set mark to catch clobber of "unused" space */
		if (size < chunk->size)
			((char *) AllocChunkGetPointer(chunk))[size] = 0x7E;
#endif
#ifdef RANDOMIZE_ALLOCATED_MEMORY
		/* fill the allocated space with junk */
		randomize_mem((char *) AllocChunkGetPointer(chunk), size);
#endif

		/* isReset must be false already */
		Assert(!set->isReset);

		AllocAllocInfo(set, chunk);
		return AllocChunkGetPointer(chunk);// 最后返回该内存片的指针。
	}

	/*
	 * Choose the actual chunk size to allocate.
	 */
	chunk_size = (1 << ALLOC_MINBITS) << fidx;
	Assert(chunk_size >= size);

	/*
	 * If there is enough room in the 【active allocation block】, we will put the
	 * chunk into that block.  Else must start a new one.
	 3）对内存上下文的内存块链表（blocks字段）的第一个内存块进行检查，如果该内存块中的未分配空间足以满足申请的内存，则直接在该内存块中分配内存片并返回内存片的指针。
	 这里可以看到，在内存上下文中进行内存分配时，总是在内存块链表中的第一个内存块中进行，当该内存块中空间用完之后会分配新的内存块并作为新的内存块链表首部，因此内存块链表中的第一块也称作活动内存块。
	 */
	if ((block = set->blocks) != NULL)
	{
		Size		availspace = block->endptr - block->freeptr;
		// 如果内存块链表中的第一个内存块没有足够的未分配空间则执行步骤4。
		if (availspace < (chunk_size + ALLOC_CHUNKHDRSZ))
		{
			/*
			 * The existing active (top) block does not have enough room for
			 * the requested allocation, but it might still have a useful
			 * amount of space in it.  Once we push it down in the block list,
			 * we'll never try to allocate more space from it. So, before we
			 * do that, carve up its free space into chunks that we can put on
			 * the set's freelists.
			 *
			 * Because we can only get here when there's less than
			 * ALLOC_CHUNK_LIMIT left in the block, this loop cannot iterate
			 * more than ALLOCSET_NUM_FREELISTS-1 times.
			 4）由于现有的内存块都不能满足这一次内存分配的要求，因此需要申请新的内存块，但是当前的活动内存块中还有未分配空间，如果申请新的内存块并将之作为新的活动内存块，则当前活动内存块中的未分配空间就会被浪费。
			 */
			while (availspace >= ((1 << ALLOC_MINBITS) + ALLOC_CHUNKHDRSZ))
			{
				Size		availchunk = availspace - ALLOC_CHUNKHDRSZ;
				int			a_fidx = AllocSetFreeIndex(availchunk);

				/*
				 * In most cases, we'll get back the index of the next larger
				 * freelist than the one we need to put this chunk on.	The
				 * exception is when availchunk is exactly a power of 2.
				 */
				if (availchunk != (1 << (a_fidx + ALLOC_MINBITS)))
				{
					a_fidx--;
					Assert(a_fidx >= 0);
					availchunk = (1 << (a_fidx + ALLOC_MINBITS));
				}
// 为了避免浪费，这里会先将当前活动内存块中的未分配空间分解成个数尽可能少的内存片（即每个内存片尽可能大)，并将它们加入到FreeList数组中,标记block = NULL;
				chunk = (AllocChunk) (block->freeptr);

				block->freeptr += (availchunk + ALLOC_CHUNKHDRSZ);
				availspace -= (availchunk + ALLOC_CHUNKHDRSZ);

				chunk->size = availchunk;
#ifdef MEMORY_CONTEXT_CHECKING
				chunk->requested_size = 0;		/* mark it free */
#endif
				chunk->aset = (void *) set->freelist[a_fidx];
				set->freelist[a_fidx] = chunk;
			}

			/* Mark that we need to create a new block */ 
			block = NULL;// <--<--
		}
	}

	/*
	 * Time to create a new regular (multi-chunk) block?
	 上面已经标记过block为空了
	 */
	if (block == NULL)
	{
		Size		required_size;

		/*
		 * The first such block has size initBlockSize, and we double the
		 * space in each succeeding block, but not more than maxBlockSize.
		 然后会创建一个新的内存块（其大小为前一次分配的内存块的两倍，但不超过maxBlockSize并将之作为新的活动内存块（即加入到内存块链表的首部)。
		 */
		blksize = set->nextBlockSize;
		set->nextBlockSize <<= 1;
		if (set->nextBlockSize > set->maxBlockSize)
			set->nextBlockSize = set->maxBlockSize;

		/*
		 * If initBlockSize is less than ALLOC_CHUNK_LIMIT, we could need more
		 * space... but try to keep it a power of 2.
		 */
		required_size = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
		while (blksize < required_size)
			blksize <<= 1;

		/* Try to allocate it */
		block = (AllocBlock) malloc(blksize);

		/*
		 * 【We could be asking for pretty big blocks here, so cope if malloc
		 * fails.】 But give up if there's less than a meg or so available...
		 */
		while (block == NULL && blksize > 1024 * 1024)//说明是因为blksize过大导致的fail
		{
			blksize >>= 1;
			if (blksize < required_size)
				break;
			block = (AllocBlock) malloc(blksize);//继续尝试malloc
		}

		if (block == NULL)//因为其他原因导致的malloc fail
		{
			MemoryContextStats(TopMemoryContext);
			ereport(ERROR,
					(errcode(ERRCODE_OUT_OF_MEMORY),
					 errmsg("out of memory"),
					 errdetail("Failed on request of size %lu.",
							   (unsigned long) size)));
		}

		block->aset = set;
		block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
		block->endptr = ((char *) block) + blksize;

		/*
		 * If this is the first block of the set, make it the "keeper" block.
		 * Formerly, a keeper block could only be created during context
		 * creation, but allowing it to happen here lets us have fast reset
		 * cycling even for contexts created with minContextSize = 0; that way
		 * we don't have to force space to be allocated in contexts that might
		 * never need any space.  Don't mark an oversize block as a keeper,
		 * however.
		 */
		if (set->keeper == NULL && blksize == set->initBlockSize)
			set->keeper = block;
		// 插入blocks的首部
		block->next = set->blocks;
		set->blocks = block;
	}

	/*
	 * OK, do the allocation
	 最后在活动内存块中分配一个满足申请内存大小的内存片，并返回其指针。
	 */
	chunk = (AllocChunk) (block->freeptr);

	block->freeptr += (chunk_size + ALLOC_CHUNKHDRSZ);
	Assert(block->freeptr <= block->endptr);

	chunk->aset = (void *) set;
	chunk->size = chunk_size;
#ifdef MEMORY_CONTEXT_CHECKING
	chunk->requested_size = size;
	/* set mark to catch clobber of "unused" space */
	if (size < chunk->size)
		((char *) AllocChunkGetPointer(chunk))[size] = 0x7E;
#endif
#ifdef RANDOMIZE_ALLOCATED_MEMORY
	/* fill the allocated space with junk */
	randomize_mem((char *) AllocChunkGetPointer(chunk), size);
#endif

	set->isReset = false;

	AllocAllocInfo(set, chunk);
	return AllocChunkGetPointer(chunk);// 最后返回
}

需要说明的是，AllocSetAlloc为内存片分配空间时并不是严格按照申请的大小来分配的，而是将申请的大小向上对齐为2的幂，然后按照对齐后的大小来分配空间。例如，我们要申请一块大小为30字节的空间，则AlloeSetAlloc实际会为我们分配一块大小为2^5=32字节的内存片，该内存片对应的AllocChunkData中的size字段设置为32，而requested_size设置为30。

内存重分配

内存重分配由AlloeSetRealloc函数实现。pointer所指向的内存中的内容将被复制到新的内存中，并释放 pointer指向的内存空间。AllocSetRealloc函数的返回值就是指向新内存空间的指针。整个重分配内存流程如下:

/*
 * AllocSetRealloc
 *		Returns new pointer to allocated memory of given size; this memory
 *		is added to the set.  Memory associated with given pointer is copied
 *		into the new memory, and the old memory is freed.
 */
static void *
AllocSetRealloc(MemoryContext context, void *pointer, Size size)
//在指定的内存上下文中对参数pointer指向的内存空间进行重新分配，新分配的内存大小由参数size指定。
{
	AllocSet	set = (AllocSet) context;
	AllocChunk	chunk = AllocPointerGetChunk(pointer);
	Size		oldsize = chunk->size;

#ifdef MEMORY_CONTEXT_CHECKING
	/* Test for someone scribbling on unused space in chunk */
	if (chunk->requested_size < oldsize)
		if (((char *) pointer)[chunk->requested_size] != 0x7E)
			elog(WARNING, "detected write past chunk end in %s %p",
				 set->header.name, chunk);
#endif

	/* isReset must be false already */
	Assert(!set->isReset);

	/*
	 * Chunk sizes are aligned to power of 2 in AllocSetAlloc(). Maybe the
	 * allocated area already is >= the new size.  (In particular, we always
	 * fall out here if the requested size is a decrease.)
	 1)由于内存片在分配之初就被对齐为2的幂，因此有可能参数 pointer指向的旧的内存空间的大小本来就大于参数size 指定的新的大小。
	 */
	if (oldsize >= size)
	{
#ifdef MEMORY_CONTEXT_CHECKING
#ifdef RANDOMIZE_ALLOCATED_MEMORY
		/* We can only fill the extra space if we know the prior request */
		if (size > chunk->requested_size)
			randomize_mem((char *) AllocChunkGetPointer(chunk) + chunk->requested_size,
						  size - chunk->requested_size);
#endif
		// 如果是这种情况，则修改pointer所指向的AllocChunkData的requested_size为新的内存大小并返回pointer;否则执行步骤2。
		chunk->requested_size = size; // <--
		/* set mark to catch clobber of "unused" space */
		if (size < oldsize)
			((char *) pointer)[size] = 0x7E;
#endif
		return pointer;
	}
// 2)若pointer所指向的内存片占据一个内存块时，则找到这个内存块并增大这个内存块的空间，即将该内存块的freeptr和endptr指针都向后移动到size所指定的位置。
// 如果pointer指向的内存片不是独占一个内存块则转而执行步骤3。
	if (oldsize > set->allocChunkLimit)
	{
		/*
		 * The chunk must have been allocated as a single-chunk block.	Find
		 * the containing block and use realloc() to make it bigger with
		 * minimum space wastage.
		 */
		AllocBlock	block = set->blocks;
		AllocBlock	prevblock = NULL;
		Size		chksize;
		Size		blksize;

		while (block != NULL) //遍历blocks链表找到chunk所占据的block
		{
			if (chunk == (AllocChunk) (((char *) block) + ALLOC_BLOCKHDRSZ))
				break;
			prevblock = block;
			block = block->next;
		}
		if (block == NULL)
			elog(ERROR, "could not find block containing chunk %p", chunk);
		/* let's just make sure chunk is the only one in the block */
		Assert(block->freeptr == ((char *) block) +
			   (chunk->size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ));

		/* Do the realloc */
		chksize = MAXALIGN(size);
		blksize = chksize + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
		block = (AllocBlock) realloc(block, blksize);
		if (block == NULL)
		{
			MemoryContextStats(TopMemoryContext);
			ereport(ERROR,
					(errcode(ERRCODE_OUT_OF_MEMORY),
					 errmsg("out of memory"),
					 errdetail("Failed on request of size %lu.",
							   (unsigned long) size)));
		}
        // 将该内存块的freeptr和endptr指针都向后移动到size所指定的位置
		block->freeptr = block->endptr = ((char *) block) + blksize;

		/* Update pointers since block has likely been moved */
		chunk = (AllocChunk) (((char *) block) + ALLOC_BLOCKHDRSZ);
		if (prevblock == NULL)
			set->blocks = block;
		else
			prevblock->next = block;
		chunk->size = chksize;

#ifdef MEMORY_CONTEXT_CHECKING
#ifdef RANDOMIZE_ALLOCATED_MEMORY
		/* We can only fill the extra space if we know the prior request */
		randomize_mem((char *) AllocChunkGetPointer(chunk) + chunk->requested_size,
					  size - chunk->requested_size);
#endif

		chunk->requested_size = size;
		/* set mark to catch clobber of "unused" space */
		if (size < chunk->size)
			((char *) AllocChunkGetPointer(chunk))[size] = 0x7E;
#endif

		return AllocChunkGetPointer(chunk);
	}
	else
	{   
		/*
		 * Small-chunk case.  We just do this by brute force, ie, allocate a
		 * new chunk and copy the data.  Since we know the existing data isn't
		 * huge, this won't involve any great memcpy expense, so it's not
		 * worth being smarter.  (At one time we tried to avoid memcpy when it
		 * was possible to enlarge the chunk in-place, but that turns out to
		 * misbehave unpleasantly for repeated cycles of
		 * palloc/repalloc/pfree: the eventually freed chunks go into the
		 * wrong freelist for the next initial palloc request, and so we leak
		 * memory indefinitely.  See pgsql-hackers archives for 2007-08-11.)
		 3）调用AllocSetAlloc分配一个新的内存片。
		 */
		AllocPointer newPointer;

		/* allocate new chunk */
		newPointer = AllocSetAlloc((MemoryContext) set, size);

		/* transfer existing data (certain to fit) */
		memcpy(newPointer, pointer, oldsize);// 将pointer所指向内存片的数据复制到其中

		/* free old chunk */
        // 然后调用AllocSetFree函数释放旧的内存片，
        // 如果是占用一个内存块的内存片则直接释放（这样的内存片所占的内存空间较大，直接释放不会造成过多碎片)；
        // 否则将其加入到FreeList中以便下次分配（经常释放较小的内存片会造成内存空间碎片化)。
		AllocSetFree((MemoryContext) set, pointer);

		return newPointer;
	}
}

内存释放

释放内存上下文中的内存，主要有以下三种方式：

释放一个内存上下文中指定的内存片

当释放一个内存上下文中指定的内存片时，调用函数AllocSetFree。该函数的执行方式如下：

/*
 * AllocSetFree
 *		Frees allocated memory; memory is removed from the set.
 */
static void
AllocSetFree(MemoryContext context, void *pointer)
{
	AllocSet	set = (AllocSet) context;
	AllocChunk	chunk = AllocPointerGetChunk(pointer);

	AllocFreeInfo(set, chunk);

#ifdef MEMORY_CONTEXT_CHECKING
	/* Test for someone scribbling on unused space in chunk */
	if (chunk->requested_size < chunk->size)
		if (((char *) pointer)[chunk->requested_size] != 0x7E)
			elog(WARNING, "detected write past chunk end in %s %p",
				 set->header.name, chunk);
#endif

	if (chunk->size > set->allocChunkLimit)
	{
		/*
		 * Big chunks are certain to have been allocated as single-chunk
		 * blocks.	Find the containing block and return it to malloc().
		 1）如果指定要释放的内存片是内存块中唯一的一个内存片，则将该内存块直接释放。
		 */
		AllocBlock	block = set->blocks;
		AllocBlock	prevblock = NULL;

		while (block != NULL)
		{
			if (chunk == (AllocChunk) (((char *) block) + ALLOC_BLOCKHDRSZ))
				break;
			prevblock = block;
			block = block->next;
		}
		if (block == NULL)
			elog(ERROR, "could not find block containing chunk %p", chunk);
		/* let's just make sure chunk is the only one in the block */
		Assert(block->freeptr == ((char *) block) +
			   (chunk->size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ));

		/* OK, remove block from aset's list and free it */
		if (prevblock == NULL)
			set->blocks = block->next;
		else
			prevblock->next = block->next;
#ifdef CLOBBER_FREED_MEMORY
		/* Wipe freed memory for debugging purposes */
		memset(block, 0x7F, block->freeptr - ((char *) block));
#endif
		free(block);
	}
	else
	{
		/* 
		Normal case, put the chunk into appropriate freelist 
		2）否则，将指定的内存片加入到Feelist链表中以便下次分配。
		*/
		int			fidx = AllocSetFreeIndex(chunk->size);

		chunk->aset = (void *) set->freelist[fidx];

#ifdef CLOBBER_FREED_MEMORY
		/* Wipe freed memory for debugging purposes */
		memset(pointer, 0x7F, chunk->size);
#endif

#ifdef MEMORY_CONTEXT_CHECKING
		/* Reset requested_size to 0 in chunks that are on freelist */
		chunk->requested_size = 0;
#endif
		set->freelist[fidx] = chunk;
	}
}

重置内存上下文

重置内存上下文的工作由函数AllocSetReset完成。它使得内存上下文重置之后就立刻有一块内存（keeper）可供使用。

/*
 * AllocSetReset
 *		Frees all memory which is allocated in the given set.
 *
 * Actually, this routine has some discretion about what to do.
 * It should mark all allocated chunks freed, but it need not necessarily
 * give back all the resources the set owns.  Our actual implementation is
 * that we hang onto any "keeper" block specified for the set.	In this way,
 * we don't thrash malloc() when a context is repeatedly reset after small
 * allocations, which is typical behavior for per-tuple contexts.
 */
static void
AllocSetReset(MemoryContext context)
{
	AllocSet	set = (AllocSet) context;
	AllocBlock	block;

	AssertArg(AllocSetIsValid(set));

	/* Nothing to do if no pallocs since startup or last reset */
	if (set->isReset)
		return;

#ifdef MEMORY_CONTEXT_CHECKING
	/* Check for corruption and leaks before freeing */
	AllocSetCheck(context);
#endif

	/* Clear chunk freelists */
	MemSetAligned(set->freelist, 0, sizeof(set->freelist));

	block = set->blocks;

	/* 
	 New blocks list is either empty or just the keeper block 
	 在进行重置时，内存上下文中除了在keeper字段中指定要保留的内存块外，其他内存块全部释放，包括空闲链表中的内存。
	 keeper中指定保留的内存块将被清空内容，它使得内存上下文重置之后就立刻有一块内存可供使用。
	 */
	set->blocks = set->keeper;

	while (block != NULL)
	{
		AllocBlock	next = block->next;
		if (block == set->keeper)
		{
			/* Reset the block, but don't return it to malloc */
			char	   *datastart = ((char *) block) + ALLOC_BLOCKHDRSZ;

#ifdef CLOBBER_FREED_MEMORY
			/* Wipe freed memory for debugging purposes 清空内容 */
			memset(datastart, 0x7F, block->freeptr - datastart);
#endif
			block->freeptr = datastart;
			block->next = NULL;
		}
		else
		{
			/* Normal case, release the block 释放其他内存块 */
#ifdef CLOBBER_FREED_MEMORY
			/* Wipe freed memory for debugging purposes */
			memset(block, 0x7F, block->freeptr - ((char *) block));
#endif
			free(block);
		}
		block = next;
	}

	/* Reset block size allocation sequence, too */
	set->nextBlockSize = set->initBlockSize;

	set->isReset = true;
}

释放当前内存上下文中的全部内存块

这个工作由AllocSetDelete函数完成，该函数释放当前内存上下文中的所有内存块，包括keeper指定的内存块在内。

内存上下文节点并不释放，因为内存上下文节点是在TopMemoryContext 中申请的内存，将在进程运行结束时统一释放。

/*
 * AllocSetDelete
 *		Frees all memory which is allocated in the given set,
 *		in preparation for deletion of the set.
 *
 * Unlike AllocSetReset, this *must* free all resources of the set.
 * But note we are not responsible for deleting the context node itself.
 */
static void
AllocSetDelete(MemoryContext context)
{
	AllocSet	set = (AllocSet) context;
	AllocBlock	block = set->blocks;

	AssertArg(AllocSetIsValid(set));

#ifdef MEMORY_CONTEXT_CHECKING
	/* Check for corruption and leaks before freeing */
	AllocSetCheck(context);
#endif

	/* Make it look empty, just in case... */
	MemSetAligned(set->freelist, 0, sizeof(set->freelist));
	set->blocks = NULL;
	set->keeper = NULL;
	// 通过遍历释放当前内存上下文中的所有内存块，包括keeper指定的内存块在内。
	while (block != NULL)
	{
		AllocBlock	next = block->next;

#ifdef CLOBBER_FREED_MEMORY
		/* Wipe freed memory for debugging purposes */
		memset(block, 0x7F, block->freeptr - ((char *) block));
#endif
		free(block);
		block = next;
	}
}

标签：set,PostgreSQL,chunk,内存,context,上下文,block,size
From： https://blog.51cto.com/u_15534346/8072983