lightdb hash index的性能和限制

标签：index rows hash lightdb .. cost time loops actual

　　除了btree外，lightdb是支持hash index的，但是总体来说支持的特性范围均不如btree索引，比如parallel没有btree索引智能，不支持=之外的操作，不支持bitmap index scan，不支持哈希唯一索引（但可以通过unique not null约束变量实现）等等，但在内存存储不使用固定长度的场景下还是很有优势的。如下：

lightdb@postgres=# create table t(id int);
WARNING:  LightDB DDL check warn! no primary key!
DETAIL:  If your system do not have data replication requirment, just ignore it
CREATE TABLE
lightdb@postgres=# create index idx_t on t using hash(id);
CREATE INDEX

lightdb@postgres=# insert into t select id from generate_series(1,10000000) id;
INSERT 0 10000000

lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id;
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Gather  (cost=854748.00..854748.11 rows=1 width=8)
   Workers Planned: 1
   Single Copy: true
   ->  Aggregate  (cost=844748.00..844748.01 rows=1 width=8)
         ->  Hash Join  (cost=308311.00..819748.00 rows=10000000 width=0)
               Hash Cond: (t1.id = t2.id)
               ->  Seq Scan on t t1  (cost=0.00..144248.00 rows=10000000 width=4)
               ->  Hash  (cost=144248.00..144248.00 rows=10000000 width=4)
                     ->  Seq Scan on t t2  (cost=0.00..144248.00 rows=10000000 width=4)
(9 rows)

Time: 0.953 ms
lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 10584.519 ms (00:10.585)

下面是OLTP的示例：

lightdb@postgres=# explain (analyze,verbose) select * from t where id in (1,2,3,4);
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10004.03..10008.49 rows=4 width=4) (actual time=16.539..19.560 rows=4 loops=1)
   Output: id
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Bitmap Heap Scan on public.t  (cost=4.03..8.09 rows=4 width=4) (actual time=0.378..0.382 rows=4 loops=1)
         Output: id
         Recheck Cond: (t.id = ANY ('{1,2,3,4}'::integer[]))
         Worker 0:  actual time=0.378..0.382 rows=4 loops=1
         ->  Bitmap Index Scan on idx_t  (cost=0.00..4.03 rows=4 width=0) (actual time=0.296..0.296 rows=4 loops=1)
               Index Cond: (t.id = ANY ('{1,2,3,4}'::integer[]))
               Worker 0:  actual time=0.296..0.296 rows=4 loops=1
 Planning Time: 0.285 ms
 Execution Time: 19.812 ms
(14 rows)

Time: 21.320 ms

===============

对应的btree如下：

lightdb@postgres=# create index idx_t on t(id);
CREATE INDEX
Time: 7298.174 ms (00:07.298)
lightdb@postgres=# explain (analyze,verbose) select * from t where id in (1,2,3,4);
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10000.43..10007.21 rows=4 width=4) (actual time=9.571..11.403 rows=4 loops=1)
   Output: id
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Index Only Scan using idx_t on public.t  (cost=0.43..6.81 rows=4 width=4) (actual time=0.176..0.185 rows=4 loops=1)
         Output: id
         Index Cond: (t.id = ANY ('{1,2,3,4}'::integer[]))
         Heap Fetches: 4
         Worker 0:  actual time=0.176..0.185 rows=4 loops=1
 Planning Time: 0.380 ms
 Execution Time: 11.588 ms
(12 rows)

Time: 12.863 ms

lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 6334.176 ms (00:06.334)

lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=525399.81..525399.82 rows=1 width=8)
   ->  Gather  (cost=525399.69..525399.80 rows=1 width=8)
         Workers Planned: 1
         ->  Partial Aggregate  (cost=515399.69..515399.70 rows=1 width=8)
               ->  Merge Join  (cost=0.87..500693.81 rows=5882353 width=0)
                     Merge Cond: (t1.id = t2.id)
                     ->  Parallel Index Only Scan using idx_t on t t1  (cost=0.43..180493.96 rows=5882353 width=4)
                     ->  Index Only Scan using idx_t on t t2  (cost=0.43..221670.43 rows=10000000 width=4)
(8 rows)

Time: 1.441 ms

====== 换成hash join
lightdb@postgres=# set enable_mergejoin to off;
SET
Time: 0.681 ms
lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=644560.95..644560.96 rows=1 width=8)
   ->  Gather  (cost=644560.83..644560.94 rows=1 width=8)
         Workers Planned: 1
         ->  Partial Aggregate  (cost=634560.83..634560.84 rows=1 width=8)
               ->  Parallel Hash Join  (cost=277001.81..619854.95 rows=5882353 width=0)
                     Hash Cond: (t1.id = t2.id)
                     ->  Parallel Index Only Scan using idx_t on t t1  (cost=0.43..180493.96 rows=5882353 width=4)
                     ->  Parallel Hash  (cost=180493.96..180493.96 rows=5882353 width=4)
                           ->  Parallel Index Only Scan using idx_t on t t2  (cost=0.43..180493.96 rows=5882353 width=4)
(9 rows)

Time: 1.492 ms
lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 9897.255 ms (00:09.897)

无论是否添加唯一索引都差不多。如下：

lightdb@postgres=# alter table t add unique(id);
ALTER TABLE
Time: 6454.915 ms (00:06.455)
lightdb@postgres=# create index idx_t on t using hash(id);
CREATE INDEX
Time: 24519.773 ms (00:24.520)
lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 10515.582 ms (00:10.516)

上面是int类型的情况，下面来看下字符串类型值相对比较长的情况。

-- hash索引
lightdb@postgres=# alter table t add column v varchar(100);
ALTER TABLE
Time: 1.614 ms
lightdb@postgres=# update t set v = uuid();
UPDATE 10000000
Time: 74764.381 ms (01:14.764)

lightdb@postgres=# drop index idx_v ;
DROP INDEX
Time: 358.055 ms
lightdb@postgres=# create index idx_v on t using hash(v);
CREATE INDEX
Time: 26212.556 ms (00:26.213)
lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1109415.94..1109416.05 rows=1 width=8) (actual time=16827.289..16843.024 rows=1 loops=1)
   Output: (count(1))
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Aggregate  (cost=1099415.94..1099415.95 rows=1 width=8) (actual time=16806.915..16806.986 rows=1 loops=1)
         Output: count(1)
         Worker 0:  actual time=16806.915..16806.986 rows=1 loops=1
         ->  Hash Join  (cost=396582.98..1074415.94 rows=9999999 width=0) (actual time=5346.963..16158.927 rows=10000000 loops=1)
               Hash Cond: ((t1.v)::text = (t2.v)::text)
               Worker 0:  actual time=5346.963..16158.927 rows=10000000 loops=1
               ->  Seq Scan on public.t t1  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.208..2068.391 rows=10000000 loops=1)
                     Output: t1.v
                     Worker 0:  actual time=0.208..2068.391 rows=10000000 loops=1
               ->  Hash  (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=5334.495..5334.558 rows=10000000 loops=1)
                     Output: t2.v
                     Buckets: 65536  Batches: 256  Memory Usage: 3149kB
                     Worker 0:  actual time=5334.495..5334.558 rows=10000000 loops=1
                     ->  Seq Scan on public.t t2  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.054..2165.095 rows=10000000 loops=1)
                           Output: t2.v
                           Worker 0:  actual time=0.054..2165.095 rows=10000000 loops=1
 Planning Time: 1.016 ms
 Execution Time: 16843.381 ms
(23 rows)

Time: 16845.926 ms (00:16.846)

-- 因为内存小，batches交互多，所以增加work_mem消除I/O影响。
lightdb@postgres=# set work_mem to '128MB';
SET
Time: 0.608 ms
lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1109415.94..1109416.05 rows=1 width=8) (actual time=22352.372..22388.694 rows=1 loops=1)
   Output: (count(1))
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Aggregate  (cost=1099415.94..1099415.95 rows=1 width=8) (actual time=22335.328..22335.363 rows=1 loops=1)
         Output: count(1)
         Worker 0:  actual time=22335.328..22335.363 rows=1 loops=1
         ->  Hash Join  (cost=396582.98..1074415.94 rows=9999999 width=0) (actual time=5627.623..21684.815 rows=10000000 loops=1)
               Hash Cond: ((t1.v)::text = (t2.v)::text)
               Worker 0:  actual time=5627.623..21684.815 rows=10000000 loops=1
               ->  Seq Scan on public.t t1  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.262..1901.155 rows=10000000 loops=1)
                     Output: t1.v
                     Worker 0:  actual time=0.262..1901.155 rows=10000000 loops=1
               ->  Hash  (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=5600.002..5600.028 rows=10000000 loops=1)
                     Output: t2.v
                     Buckets: 2097152  Batches: 8  Memory Usage: 100708kB
                     Worker 0:  actual time=5600.002..5600.028 rows=10000000 loops=1
                     ->  Seq Scan on public.t t2  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.188..2076.519 rows=10000000 loops=1)
                           Output: t2.v
                           Worker 0:  actual time=0.188..2076.519 rows=10000000 loops=1
 Planning Time: 0.415 ms
 Execution Time: 22388.947 ms
(23 rows)

Time: 22390.522 ms (00:22.391)
lightdb@postgres=# set work_mem to '1280MB';
SET
Time: 0.651 ms
lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=796915.94..796916.05 rows=1 width=8) (actual time=18028.588..18412.918 rows=1 loops=1)
   Output: (count(1))
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Aggregate  (cost=786915.94..786915.95 rows=1 width=8) (actual time=18017.839..18017.861 rows=1 loops=1)
         Output: count(1)
         Worker 0:  actual time=18017.839..18017.861 rows=1 loops=1
         ->  Hash Join  (cost=318457.98..761915.94 rows=9999999 width=0) (actual time=7028.907..17394.653 rows=10000000 loops=1)
               Hash Cond: ((t1.v)::text = (t2.v)::text)
               Worker 0:  actual time=7028.907..17394.653 rows=10000000 loops=1
               ->  Seq Scan on public.t t1  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.141..1507.013 rows=10000000 loops=1)
                     Output: t1.id, t1.v
                     Worker 0:  actual time=0.141..1507.013 rows=10000000 loops=1
               ->  Hash  (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=6881.732..6881.746 rows=10000000 loops=1)
                     Output: t2.v
                     Buckets: 16777216  Batches: 1  Memory Usage: 804901kB
                     Worker 0:  actual time=6881.732..6881.746 rows=10000000 loops=1
                     ->  Seq Scan on public.t t2  (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.128..1781.326 rows=10000000 loops=1)
                           Output: t2.v
                           Worker 0:  actual time=0.128..1781.326 rows=10000000 loops=1
 Planning Time: 0.327 ms
 Execution Time: 18413.120 ms
(23 rows)

Time: 18414.684 ms (00:18.415)

-- b树索引
lightdb@postgres=# drop index idx_v ;
DROP INDEX
Time: 133.610 ms
lightdb@postgres=# create index idx_v on t(v);
CREATE INDEX
Time: 99610.086 ms (01:39.610)
lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v;
                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=635535.74..635535.76 rows=1 width=8) (actual time=55059.701..55482.026 rows=1 loops=1)
   Output: count(1)
   ->  Gather  (cost=635535.53..635535.74 rows=2 width=8) (actual time=54912.085..55481.978 rows=3 loops=1)
         Output: (PARTIAL count(1))
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=625535.53..625535.54 rows=1 width=8) (actual time=54888.937..54888.960 rows=1 loops=3)
               Output: PARTIAL count(1)
               Worker 0:  actual time=54886.263..54886.305 rows=1 loops=1
               Worker 1:  actual time=54868.875..54868.890 rows=1 loops=1
               ->  Parallel Hash Join  (cost=309296.10..615118.86 rows=4166666 width=0) (actual time=26238.000..54499.870 rows=3333333 loops=3)
                     Hash Cond: ((t1.v)::text = (t2.v)::text)
                     Worker 0:  actual time=26224.367..54500.826 rows=3357480 loops=1
                     Worker 1:  actual time=26223.631..54484.862 rows=3362240 loops=1
                     ->  Parallel Index Only Scan using idx_v on public.t t1  (cost=0.56..257212.22 rows=4166666 width=37) (actual time=0.112..23401.164 rows=3333333 loops=3)
                           Output: t1.v
                           Heap Fetches: 10000000
                           Worker 0:  actual time=0.114..23476.367 rows=3357480 loops=1
                           Worker 1:  actual time=0.100..23249.643 rows=3362240 loops=1
                     ->  Parallel Hash  (cost=257212.22..257212.22 rows=4166666 width=37) (actual time=26176.344..26176.362 rows=3333333 loops=3)
                           Output: t2.v
                           Buckets: 16777216  Batches: 1  Memory Usage: 835968kB
                           Worker 0:  actual time=26223.984..26224.022 rows=3279920 loops=1
                           Worker 1:  actual time=26223.158..26223.169 rows=3353980 loops=1
                           ->  Parallel Index Only Scan using idx_v on public.t t2  (cost=0.56..257212.22 rows=4166666 width=37) (actual time=0.198..23040.507 rows=3333333 loops=3)
                                 Output: t2.v
                                 Heap Fetches: 10000000
                                 Worker 0:  actual time=0.136..23046.634 rows=3279920 loops=1
                                 Worker 1:  actual time=0.327..22978.942 rows=3353980 loops=1
 Planning Time: 0.559 ms
 Execution Time: 55482.322 ms
(31 rows)

Time: 55484.995 ms (00:55.485)

从上可知，在字符串下，hash索引相比btree索引性能高3倍以上。hash索引还是很有价值的。

标签：index,rows,hash,lightdb,..,cost,time,loops,actual
From： https://www.cnblogs.com/lightdb/p/18237588

lightdb hash index的性能和限制

相关文章

赞助商

阅读排行