Greenplum数据库Hash分布执行器部分主要涉及Motion、Result和SplictUpdate节点。以使用CdbHash *makeCdbHash(int numsegs, int natts, Oid *hashfuncs)创建一个 CdbHash 结构体、cdbhashinit()执行初始化操作,仅仅是初始化hash初始值、cdbhash()函数会调用hashDatum()针对不同类型做不同的预处理,最后将处理后的列值添加到哈希计算中、cdbhashreduce() 映射哈希值到某个 segment为脉络学习以下执行器对Hash分布的处理。
Motion
只有当Motion类型为MOTIONTYPE_HASH且执行发送任务(MOTIONSTATE_SEND)的后端才可能涉及Hash分布处理(motionstate->mstype == MOTIONSTATE_SEND && node->motionType == MOTIONTYPE_HASH
)。也就是说后端进程需要将处理的数据直接发送给其他后端进程,且这个接收后端可以通过分布键数据进行计算哈希值、映射segment后定位到。其执行堆栈为ExecInitNode --> ExecInitMotion --> makeCdbHash。
涉及hash的motion执行流程堆栈如下ExecMotion --> execMotionSender --> doSendTuple --> eval(nodeMotion.c) --> cdbhashinit和cdbhash。调用doSendTuple发送tuple的情况下,当motion类型为MOTIONTYPE_HASH,就需要计算出该segment的index,然后设置到targetRoute变量中。计算的函数就是eval(232, 232, 232); background: rgb(249, 249, 249);">
uint32 eval(ExprContext *econtext, List *hashkeys, CdbHash * h) {标签:执行器,Hash,tuple,--,Greenplum,seg,hash,econtext From: https://blog.51cto.com/feishujun/5740687
ListCell *hk; unsigned int target_seg;
ResetExprContext(econtext);
MemoryContext oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory); // 切换到ecxt_per_tuple_memory
if (list_length(hashkeys) > 0){ /* If we have 1 or more distribution keys for this relation, hash them. However, If this happens to be a relation with an empty policy (partitioning policy with a NULL distribution key list) then we have no hash key value to feed in, so use cdbhashrandomseg() to pick a segment at random. */
int i = 0;
cdbhashinit(h);
foreach(hk, hashkeys){
ExprState *keyexpr = (ExprState *) lfirst(hk);
Datum keyval; bool isNull;
keyval = ExecEvalExpr(keyexpr, econtext, &isNull); /* Get the attribute value of the tuple */
cdbhash(h, i + 1, keyval, isNull); /* Compute the hash function */
i++;
}
target_seg = cdbhashreduce(h);
}else {
target_seg = cdbhashrandomseg(h->numsegs);
}
MemoryContextSwitchTo(oldContext);
return target_seg;
}