除了btree外,lightdb是支持hash index的,但是总体来说支持的特性范围均不如btree索引,比如parallel没有btree索引智能,不支持=之外的操作,不支持bitmap index scan,不支持哈希唯一索引(但可以通过unique not null约束变量实现)等等,但在内存存储不使用固定长度的场景下还是很有优势的。如下:
lightdb@postgres=# create table t(id int); WARNING: LightDB DDL check warn! no primary key! DETAIL: If your system do not have data replication requirment, just ignore it CREATE TABLE lightdb@postgres=# create index idx_t on t using hash(id); CREATE INDEX lightdb@postgres=# insert into t select id from generate_series(1,10000000) id; INSERT 0 10000000 lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id; QUERY PLAN ---------------------------------------------------------------------------------------- Gather (cost=854748.00..854748.11 rows=1 width=8) Workers Planned: 1 Single Copy: true -> Aggregate (cost=844748.00..844748.01 rows=1 width=8) -> Hash Join (cost=308311.00..819748.00 rows=10000000 width=0) Hash Cond: (t1.id = t2.id) -> Seq Scan on t t1 (cost=0.00..144248.00 rows=10000000 width=4) -> Hash (cost=144248.00..144248.00 rows=10000000 width=4) -> Seq Scan on t t2 (cost=0.00..144248.00 rows=10000000 width=4) (9 rows) Time: 0.953 ms lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id; count ---------- 10000000 (1 row) Time: 10584.519 ms (00:10.585)
下面是OLTP的示例:
lightdb@postgres=# explain (analyze,verbose) select * from t where id in (1,2,3,4); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Gather (cost=10004.03..10008.49 rows=4 width=4) (actual time=16.539..19.560 rows=4 loops=1) Output: id Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Bitmap Heap Scan on public.t (cost=4.03..8.09 rows=4 width=4) (actual time=0.378..0.382 rows=4 loops=1) Output: id Recheck Cond: (t.id = ANY ('{1,2,3,4}'::integer[])) Worker 0: actual time=0.378..0.382 rows=4 loops=1 -> Bitmap Index Scan on idx_t (cost=0.00..4.03 rows=4 width=0) (actual time=0.296..0.296 rows=4 loops=1) Index Cond: (t.id = ANY ('{1,2,3,4}'::integer[])) Worker 0: actual time=0.296..0.296 rows=4 loops=1 Planning Time: 0.285 ms Execution Time: 19.812 ms (14 rows) Time: 21.320 ms
===============
对应的btree如下:
lightdb@postgres=# create index idx_t on t(id); CREATE INDEX Time: 7298.174 ms (00:07.298) lightdb@postgres=# explain (analyze,verbose) select * from t where id in (1,2,3,4); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Gather (cost=10000.43..10007.21 rows=4 width=4) (actual time=9.571..11.403 rows=4 loops=1) Output: id Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Index Only Scan using idx_t on public.t (cost=0.43..6.81 rows=4 width=4) (actual time=0.176..0.185 rows=4 loops=1) Output: id Index Cond: (t.id = ANY ('{1,2,3,4}'::integer[])) Heap Fetches: 4 Worker 0: actual time=0.176..0.185 rows=4 loops=1 Planning Time: 0.380 ms Execution Time: 11.588 ms (12 rows) Time: 12.863 ms lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id; count ---------- 10000000 (1 row) Time: 6334.176 ms (00:06.334) lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=525399.81..525399.82 rows=1 width=8) -> Gather (cost=525399.69..525399.80 rows=1 width=8) Workers Planned: 1 -> Partial Aggregate (cost=515399.69..515399.70 rows=1 width=8) -> Merge Join (cost=0.87..500693.81 rows=5882353 width=0) Merge Cond: (t1.id = t2.id) -> Parallel Index Only Scan using idx_t on t t1 (cost=0.43..180493.96 rows=5882353 width=4) -> Index Only Scan using idx_t on t t2 (cost=0.43..221670.43 rows=10000000 width=4) (8 rows) Time: 1.441 ms ====== 换成hash join lightdb@postgres=# set enable_mergejoin to off; SET Time: 0.681 ms lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=644560.95..644560.96 rows=1 width=8) -> Gather (cost=644560.83..644560.94 rows=1 width=8) Workers Planned: 1 -> Partial Aggregate (cost=634560.83..634560.84 rows=1 width=8) -> Parallel Hash Join (cost=277001.81..619854.95 rows=5882353 width=0) Hash Cond: (t1.id = t2.id) -> Parallel Index Only Scan using idx_t on t t1 (cost=0.43..180493.96 rows=5882353 width=4) -> Parallel Hash (cost=180493.96..180493.96 rows=5882353 width=4) -> Parallel Index Only Scan using idx_t on t t2 (cost=0.43..180493.96 rows=5882353 width=4) (9 rows) Time: 1.492 ms lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id; count ---------- 10000000 (1 row) Time: 9897.255 ms (00:09.897)
无论是否添加唯一索引都差不多。如下:
lightdb@postgres=# alter table t add unique(id); ALTER TABLE Time: 6454.915 ms (00:06.455) lightdb@postgres=# create index idx_t on t using hash(id); CREATE INDEX Time: 24519.773 ms (00:24.520) lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id; count ---------- 10000000 (1 row) Time: 10515.582 ms (00:10.516)
上面是int类型的情况,下面来看下字符串类型值相对比较长的情况。
-- hash索引
lightdb@postgres=# alter table t add column v varchar(100); ALTER TABLE Time: 1.614 ms lightdb@postgres=# update t set v = uuid(); UPDATE 10000000 Time: 74764.381 ms (01:14.764) lightdb@postgres=# drop index idx_v ; DROP INDEX Time: 358.055 ms lightdb@postgres=# create index idx_v on t using hash(v); CREATE INDEX Time: 26212.556 ms (00:26.213) lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1109415.94..1109416.05 rows=1 width=8) (actual time=16827.289..16843.024 rows=1 loops=1) Output: (count(1)) Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Aggregate (cost=1099415.94..1099415.95 rows=1 width=8) (actual time=16806.915..16806.986 rows=1 loops=1) Output: count(1) Worker 0: actual time=16806.915..16806.986 rows=1 loops=1 -> Hash Join (cost=396582.98..1074415.94 rows=9999999 width=0) (actual time=5346.963..16158.927 rows=10000000 loops=1) Hash Cond: ((t1.v)::text = (t2.v)::text) Worker 0: actual time=5346.963..16158.927 rows=10000000 loops=1 -> Seq Scan on public.t t1 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.208..2068.391 rows=10000000 loops=1) Output: t1.v Worker 0: actual time=0.208..2068.391 rows=10000000 loops=1 -> Hash (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=5334.495..5334.558 rows=10000000 loops=1) Output: t2.v Buckets: 65536 Batches: 256 Memory Usage: 3149kB Worker 0: actual time=5334.495..5334.558 rows=10000000 loops=1 -> Seq Scan on public.t t2 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.054..2165.095 rows=10000000 loops=1) Output: t2.v Worker 0: actual time=0.054..2165.095 rows=10000000 loops=1 Planning Time: 1.016 ms Execution Time: 16843.381 ms (23 rows) Time: 16845.926 ms (00:16.846) -- 因为内存小,batches交互多,所以增加work_mem消除I/O影响。 lightdb@postgres=# set work_mem to '128MB'; SET Time: 0.608 ms lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1109415.94..1109416.05 rows=1 width=8) (actual time=22352.372..22388.694 rows=1 loops=1) Output: (count(1)) Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Aggregate (cost=1099415.94..1099415.95 rows=1 width=8) (actual time=22335.328..22335.363 rows=1 loops=1) Output: count(1) Worker 0: actual time=22335.328..22335.363 rows=1 loops=1 -> Hash Join (cost=396582.98..1074415.94 rows=9999999 width=0) (actual time=5627.623..21684.815 rows=10000000 loops=1) Hash Cond: ((t1.v)::text = (t2.v)::text) Worker 0: actual time=5627.623..21684.815 rows=10000000 loops=1 -> Seq Scan on public.t t1 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.262..1901.155 rows=10000000 loops=1) Output: t1.v Worker 0: actual time=0.262..1901.155 rows=10000000 loops=1 -> Hash (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=5600.002..5600.028 rows=10000000 loops=1) Output: t2.v Buckets: 2097152 Batches: 8 Memory Usage: 100708kB Worker 0: actual time=5600.002..5600.028 rows=10000000 loops=1 -> Seq Scan on public.t t2 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.188..2076.519 rows=10000000 loops=1) Output: t2.v Worker 0: actual time=0.188..2076.519 rows=10000000 loops=1 Planning Time: 0.415 ms Execution Time: 22388.947 ms (23 rows) Time: 22390.522 ms (00:22.391) lightdb@postgres=# set work_mem to '1280MB'; SET Time: 0.651 ms lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=796915.94..796916.05 rows=1 width=8) (actual time=18028.588..18412.918 rows=1 loops=1) Output: (count(1)) Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Aggregate (cost=786915.94..786915.95 rows=1 width=8) (actual time=18017.839..18017.861 rows=1 loops=1) Output: count(1) Worker 0: actual time=18017.839..18017.861 rows=1 loops=1 -> Hash Join (cost=318457.98..761915.94 rows=9999999 width=0) (actual time=7028.907..17394.653 rows=10000000 loops=1) Hash Cond: ((t1.v)::text = (t2.v)::text) Worker 0: actual time=7028.907..17394.653 rows=10000000 loops=1 -> Seq Scan on public.t t1 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.141..1507.013 rows=10000000 loops=1) Output: t1.id, t1.v Worker 0: actual time=0.141..1507.013 rows=10000000 loops=1 -> Hash (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=6881.732..6881.746 rows=10000000 loops=1) Output: t2.v Buckets: 16777216 Batches: 1 Memory Usage: 804901kB Worker 0: actual time=6881.732..6881.746 rows=10000000 loops=1 -> Seq Scan on public.t t2 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.128..1781.326 rows=10000000 loops=1) Output: t2.v Worker 0: actual time=0.128..1781.326 rows=10000000 loops=1 Planning Time: 0.327 ms Execution Time: 18413.120 ms (23 rows) Time: 18414.684 ms (00:18.415)
-- b树索引 lightdb@postgres=# drop index idx_v ; DROP INDEX Time: 133.610 ms lightdb@postgres=# create index idx_v on t(v); CREATE INDEX Time: 99610.086 ms (01:39.610) lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=635535.74..635535.76 rows=1 width=8) (actual time=55059.701..55482.026 rows=1 loops=1) Output: count(1) -> Gather (cost=635535.53..635535.74 rows=2 width=8) (actual time=54912.085..55481.978 rows=3 loops=1) Output: (PARTIAL count(1)) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=625535.53..625535.54 rows=1 width=8) (actual time=54888.937..54888.960 rows=1 loops=3) Output: PARTIAL count(1) Worker 0: actual time=54886.263..54886.305 rows=1 loops=1 Worker 1: actual time=54868.875..54868.890 rows=1 loops=1 -> Parallel Hash Join (cost=309296.10..615118.86 rows=4166666 width=0) (actual time=26238.000..54499.870 rows=3333333 loops=3) Hash Cond: ((t1.v)::text = (t2.v)::text) Worker 0: actual time=26224.367..54500.826 rows=3357480 loops=1 Worker 1: actual time=26223.631..54484.862 rows=3362240 loops=1 -> Parallel Index Only Scan using idx_v on public.t t1 (cost=0.56..257212.22 rows=4166666 width=37) (actual time=0.112..23401.164 rows=3333333 loops=3) Output: t1.v Heap Fetches: 10000000 Worker 0: actual time=0.114..23476.367 rows=3357480 loops=1 Worker 1: actual time=0.100..23249.643 rows=3362240 loops=1 -> Parallel Hash (cost=257212.22..257212.22 rows=4166666 width=37) (actual time=26176.344..26176.362 rows=3333333 loops=3) Output: t2.v Buckets: 16777216 Batches: 1 Memory Usage: 835968kB Worker 0: actual time=26223.984..26224.022 rows=3279920 loops=1 Worker 1: actual time=26223.158..26223.169 rows=3353980 loops=1 -> Parallel Index Only Scan using idx_v on public.t t2 (cost=0.56..257212.22 rows=4166666 width=37) (actual time=0.198..23040.507 rows=3333333 loops=3) Output: t2.v Heap Fetches: 10000000 Worker 0: actual time=0.136..23046.634 rows=3279920 loops=1 Worker 1: actual time=0.327..22978.942 rows=3353980 loops=1 Planning Time: 0.559 ms Execution Time: 55482.322 ms (31 rows) Time: 55484.995 ms (00:55.485)
从上可知,在字符串下,hash索引相比btree索引性能高3倍以上。hash索引还是很有价值的。
标签:index,rows,hash,lightdb,..,cost,time,loops,actual From: https://www.cnblogs.com/lightdb/p/18237588