标签：tuple free project 插入哈希 table Query Execution page

15445 project 3 QUERY EXECUTION

结构设计

Catalog 记录了数据库的元数据，记录了数据库有那些表和哪些索引，还包含了 TableInfo 和 IndexInfo 结构，记录了表和索引的元数据。

对于 TableInfo 和 IndexInfo，最重要的是表/索引的结构，封装为 Schema 类。通过 Schema，我们可以获得所有的 Column，也可以获得关于 columns 的信息，比如 column 的长度限制、column的个数。关于一个列的元数据，则用 Column 类表示。

因此，我们可以得到这样的包含关系：Column < Schema < TableInfo/IndexInfo < Catalog。

Task

TASK #1 - CREATING A CATALOG TABLE

需要实现 CreateTable、GetTable(const std::string &table_name)、和GetTable(table_oid_t table_oid)方法。

/**
   * Map table identifier -> table metadata.
   *
   * NOTE: `tables_` owns all table metadata.
   */
std::unordered_map<table_oid_t, std::unique_ptr<TableInfo>> tables_;

/** Map table name -> table identifiers. */
std::unordered_map<std::string, table_oid_t> table_names_;

/** The next table identifier to be used. */
std::atomic<table_oid_t> next_table_oid_{0};

维护好这三个成员变量，CreatTable 时，将新表的名字和标识符的映射，新表的标识符和 TableInfo 指针的映射，添加到 tables_ 和 table_names_ 中。

TASK #2 - EXECUTORS

SEQUENTIAL SCANS

通过 TableIterator 对表进行遍历，只要满足谓词，就返回所指向的 tuple；若不满足，则继续往下迭代。

INSERT

有两种类型的插入：

raw inserts：要插入的值直接在 InsertPlanNode 中，在执行计划中；
not-raw inserts：要插入的值在 child_executor_ 中，需要从子执行程序中获取要插入的值，调用 child_executor_->Next(&tuple)获取要插入的值。

HASH JOIN

Hash Join 分为两个阶段：

利用左表的 key 构建哈希表，哈希表的每个 bucket 尽量装有相同值的 tuple。
对于右表的每个 tuple 在利用左表构建的哈希表中进行查询，匹配到某个 bucket，然后在该 bucket 中，进行 Nested Join，匹配到符合 join 条件的左表的 tuple，然后利用 out_schema 构造输出的 tuple，并返回该tuple，即 join 之后的结果。

AGGREGATION

利用 SimpleAggregationHashTable，AggregationExecutor 初始化时，就是将所有子执行器提供的 tuple 插入到哈希表中，每一次插入都会更新聚合结果，初始化完成即聚合完成。执行 AggregationExecutor，只需要利用哈希表迭代器 aht_iterator_ 进行迭代，将符合谓词的 tuple 返回即可。

TmpTuplePage

TmpTuplePage format:
Sizes are in bytes.
| PageId (4) | LSN (4) | FreeSpace (4) | (free space) | TupleSize2 | TupleData2 | TupleSize1 | TupleData1 |
													  ^
												free space pointer

前12个字节是 page header，维护 TmpTuplePage 的元数据：page id、lsn 和 free space pointer。free space pointer 是相对于page id而言的偏移量，如果 page 中一个 tuple 都没有，那么 free space pointer = PAGE_SIZE。

插入 tuple是从 page 的末尾开始插入的。每次先插入 TupleData，进而插入占用4个字节的 TupleSize。TupleData的大小不定，为 TupleSize。所以每次插入一个 tuple 占用的空间为 tuple.size_ + 4。

标签：tuple,free,project,插入,哈希,table,Query,Execution,page
From： https://www.cnblogs.com/oneOmega/p/16751547.html

15-445 project #3 - Query Execution