这是cmu 15-445第五节课程Storage Models & Compression的上半部分,主要包括Storage Models的内容,压缩部分下次再整理,学完这部分可以去做hw2的第一部分
课程主页:CMU 15-445/645 :: Intro to Database Systems (Fall 2023)
(有几张图片目前没上传,过两天补一下)
Database Workloads
-
OLTP(On-line Transaction Processing)→处理
Simple queries that read/update a small amount of data that is related to a single entity in the database
An OLTP workload is characterized by fast, short running operations, repetitive operations and simple queries that operate on single entity at a time.
- 特点:操作次数较多,但操作相对较简单,更多的与read有关。
-
OLAP(On-line Analytical Processing)→分析
Complex queries that read large portions of the database spanning multiple entities.
An OLAP workload is characterized by long running, complex queries and reads on large portions of the database.
- 特点: 操作会较为复杂,可能为一个属性涉及多个实体。
Storage Model
- NSM(N-ary storage model)
-
The DBMS stores (almost) all attributes for a single tuple contiguously in a single page.
-
row storage(行存储):将一行数据的所有列值存储在一起
-
适合OLTP系统→当需要插入一个新纪录,或某个记录的多个列,只需访问相应的行数据
-
优点:
- Fast inserts, updates, and deletes.
- Good for queries that need the entire tuple (OLTP).
- Can use index-oriented physical storage for clustering.
- 可以高效的支持聚簇索引→可以使用索引导向物理存储。
-
缺点:
- Not good for scanning large portions of the table and/or a subset of the attributes.
- Terrible memory locality in access patterns.
- memory locality(内存局部性):在访问内存时,数据在内存中的位置是否靠近。如果访问的数据在内存中相邻,那么内存局部性好,性能高;反之,则内存局部性差,性能低。
- 在访问特定列数据时会出现这个问题导致频繁的内存页面切换和缓存未命中,降低访问效率。
- Not ideal for compression because of multiple value domains within a single page.
- 压缩通常以来数据的相似性。
-
- DECOMPOSITION STORAGE MODEL (DSM)
-
The DBMS stores a single attribute for all tuples contiguously in a block of data.
-
Column Storage(列存储):不是按实体,而是按属性放一起
-
适合OLAP系统→可以在一个表内就查到很多实体的某个属性
-
元组识别(重新组合为完整的行)
- Fixed-length Offsets
- 若对一个属性,每个值长度都一样,可以按照偏移量确定同一行
- Embedded Tuple Ids
- 为每个值添加一个饿对应的tuple id
- Fixed-length Offsets
-
优势:
- Reduces the amount wasted I/O per query because the DBMS only reads the data that it needs.
- 查某个实体的某个属性时就不会查该实体的别的属性
- Faster query processing because of increased locality and cached data reuse.
- Better data compression
- 更好压缩(下文
- Reduces the amount wasted I/O per query because the DBMS only reads the data that it needs.
-
缺点:
- Slow for point queries, inserts, updates, and deletes because of tuple splitting/stitching/reorganization.
-
- PAX(Partition Attributes Across)混合储存模型
-
Partition Attributes Across (PAX) is a hybrid storage model that vertically partitions attributes within a database page.
-
The goal is to get the benefit of faster processing on columnar storage while retaining the spatial locality benefits of row storage.
-
将数据按照行分区(页级别),在每个数据页内按照列存储,
-