KingbaseES 为什么select主键列不走索引

标签：rows hit .. 列不走扫描 cost shared KingbaseES 主键

背景

有客户提出一个问题。一个类似这样的SQL语句，select count(id) from 为什么执行计划用全表扫，不用索引。id列上有主键。

分析

test=# explain (analyze, buffers ) select count(id) from t1;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3385.00..3385.01 rows=1 width=8) (actual time=33.969..33.970 rows=1 loops=1)
   Buffers: shared hit=885
   ->  Seq Scan on t1  (cost=0.00..2885.00 rows=200000 width=4) (actual time=0.025..16.867 rows=200000 loops=1)
         Buffers: shared hit=885
 Planning Time: 0.044 ms
 Execution Time: 33.992 ms
(6 rows)

如下，临时关闭全表扫描，执行计划用到了索引，而且用到了并行任务，执行计划时间77毫秒，显然并行索引扫描的cost的估算完成成本4937.75明显要高于全表扫描完成成本的2885。

test=# set enable_seqscan =off;
SET
test=# explain (analyze, buffers ) select count(id) from t1;
                                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
--------------------------
 Finalize Aggregate  (cost=6146.30..6146.31 rows=1 width=8) (actual time=77.077..77.164 rows=1 loops=1)
   Buffers: shared hit=891 read=551
   ->  Gather  (cost=6146.09..6146.30 rows=2 width=8) (actual time=76.250..77.157 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=891 read=551
         ->  Partial Aggregate  (cost=5146.09..5146.10 rows=1 width=8) (actual time=65.968..65.969 rows=1 loops=3)
               Buffers: shared hit=891 read=551
               ->  Parallel Index Only Scan using id_pkey on t1  (cost=0.42..4937.75 rows=83333 width=4) (actual time=9.001..5
3.003 rows=66667 loops=3)
                     Heap Fetches: 200000
                     Buffers: shared hit=891 read=551
 Planning Time: 0.073 ms
 Execution Time: 77.191 ms
(13 rows)

如下，关闭并行任务后重试，执行计划用到了index only scan，它的意思是不需要回表的索引扫描，结果集所需要的数据从索引中就可以获取，因为 select id 列上有主键。关闭并行后，这次执行计划对比第一个全表扫描的执行计划步骤极其相似，区别在于seq scan 和 index only scan。我们可以看到执行时间一个是33毫秒，一个是50毫秒。为什么执行时间上有此差别？

test=# set max_parallel_workers_per_gather=0;
SET
test=#
test=# explain (analyze, buffers ) select count(id) from t1;
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
-------
 Aggregate  (cost=6604.42..6604.43 rows=1 width=8) (actual time=50.032..50.033 rows=1 loops=1)
   Buffers: shared hit=1437
   ->  Index Only Scan using id_pkey on t1  (cost=0.42..6104.42 rows=200000 width=4) (actual time=0.018..34.042 rows=200000 lo
ops=1)
         Heap Fetches: 200000
         Buffers: shared hit=1437
 Planning Time: 0.059 ms
 Execution Time: 50.059 ms
(7 rows)

如下，执行vacuum t1后， heap fetches由200000变成了0，执行时间从50毫秒变成了35毫秒。heap fetches 代表需要扫描的数据块个数。虽然Index Only Scan 可以从索引直接输出结果，但是因为MVCC机制，需要对扫描的元组进行可见性判断，即检查visibility MAP 文件。当新建表之后，如果没有进行过vacuum或autovacuum操作，这时还没有VM文件，而索引没有保存记录的版本信息，索引Index Only Scan 还是需要扫描数据块来获取版本信息。

此时的执行计划和第一次执行计划的全表扫描时间更接近了，区别是全表扫描的执行时间是33毫秒，index only scan的执行时间是35毫秒。我们分析一下，全表扫描执行计划中，buffers：shared hit=885,index only scan执行计划中buffers：shared hit=553。shared hit表示从缓存中命中的page数，既然全表扫描执行计划缓存中命中数高于index only scan，为什么执行计划时间更短呢？因为涉及数据的扫描方式，索引扫描涉及随机IO，而顺序扫描也就是全表扫描涉及顺序IO。大多数情况下，Index Scan 要比 Seq Scan 快。但是如果获取的结果集占所有数据的比重很大时，这时优化器计算成本后可能不会选择使用index Scan，只有当计算得出顺序IO的代价大于随机IO时，才会选择索引扫描。本案例中，返回全表结果集数据，优化器计算随机IO的获取成本无疑高于顺序IO，是否使用索引扫描和需要扫描的数据量，列的唯一值占总行数比值有关。

test=# vacuum t1;
VACUUM
test=#
test=# explain (analyze, buffers ) select count(id) from t1;
                                                             QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
-------
 Aggregate  (cost=5716.42..5716.43 rows=1 width=8) (actual time=35.790..35.791 rows=1 loops=1)
   Buffers: shared hit=553
   ->  Index Only Scan using id_pkey on t1  (cost=0.42..5216.42 rows=200000 width=4) (actual time=0.014..20.739 rows=200000 lo
ops=1)
         Heap Fetches: 0
         Buffers: shared hit=553
 Planning Time: 0.086 ms
 Execution Time: 35.815 ms
(7 rows)

标签：rows,hit,..,列不走,扫描,cost,shared,KingbaseES,主键
From： https://www.cnblogs.com/kingbase/p/17921588.html

KingbaseES 为什么select主键列不走索引

背景

分析

相关文章

赞助商

阅读排行