首页 > 其他分享 >lightdb hash index的性能和限制

lightdb hash index的性能和限制

时间:2024-06-07 17:35:07浏览次数:29  
标签:index rows hash lightdb .. cost time loops actual

  除了btree外,lightdb是支持hash index的,但是总体来说支持的特性范围均不如btree索引,比如parallel没有btree索引智能,不支持=之外的操作,不支持bitmap index scan,不支持哈希唯一索引(但可以通过unique not null约束变量实现)等等,但在内存存储不使用固定长度的场景下还是很有优势的。如下:

lightdb@postgres=# create table t(id int);
WARNING:  LightDB DDL check warn! no primary key!
DETAIL:  If your system do not have data replication requirment, just ignore it
CREATE TABLE
lightdb@postgres=# create index idx_t on t using hash(id);
CREATE INDEX

lightdb@postgres=# insert into t select id from generate_series(1,10000000) id;
INSERT 0 10000000

lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id;
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Gather  (cost=854748.00..854748.11 rows=1 width=8)
   Workers Planned: 1
   Single Copy: true
   ->  Aggregate  (cost=844748.00..844748.01 rows=1 width=8)
         ->  Hash Join  (cost=308311.00..819748.00 rows=10000000 width=0)
               Hash Cond: (t1.id = t2.id)
               ->  Seq Scan on t t1  (cost=0.00..144248.00 rows=10000000 width=4)
               ->  Hash  (cost=144248.00..144248.00 rows=10000000 width=4)
                     ->  Seq Scan on t t2  (cost=0.00..144248.00 rows=10000000 width=4)
(9 rows)

Time: 0.953 ms
lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 10584.519 ms (00:10.585)

下面是OLTP的示例:

lightdb@postgres=# explain (analyze,verbose) select * from t where id in (1,2,3,4);
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10004.03..10008.49 rows=4 width=4) (actual time=16.539..19.560 rows=4 loops=1)
   Output: id
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Bitmap Heap Scan on public.t  (cost=4.03..8.09 rows=4 width=4) (actual time=0.378..0.382 rows=4 loops=1)
         Output: id
         Recheck Cond: (t.id = ANY ('{1,2,3,4}'::integer[]))
         Worker 0:  actual time=0.378..0.382 rows=4 loops=1
         ->  Bitmap Index Scan on idx_t  (cost=0.00..4.03 rows=4 width=0) (actual time=0.296..0.296 rows=4 loops=1)
               Index Cond: (t.id = ANY ('{1,2,3,4}'::integer[]))
               Worker 0:  actual time=0.296..0.296 rows=4 loops=1
 Planning Time: 0.285 ms
 Execution Time: 19.812 ms
(14 rows)

Time: 21.320 ms

===============

对应的btree如下:

lightdb@postgres=# create index idx_t on t(id);
CREATE INDEX
Time: 7298.174 ms (00:07.298)
lightdb@postgres=# explain (analyze,verbose) select * from t where id in (1,2,3,4);
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10000.43..10007.21 rows=4 width=4) (actual time=9.571..11.403 rows=4 loops=1)
   Output: id
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Index Only Scan using idx_t on public.t  (cost=0.43..6.81 rows=4 width=4) (actual time=0.176..0.185 rows=4 loops=1)
         Output: id
         Index Cond: (t.id = ANY ('{1,2,3,4}'::integer[]))
         Heap Fetches: 4
         Worker 0:  actual time=0.176..0.185 rows=4 loops=1
 Planning Time: 0.380 ms
 Execution Time: 11.588 ms
(12 rows)

Time: 12.863 ms

lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 6334.176 ms (00:06.334)

lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=525399.81..525399.82 rows=1 width=8)
   ->  Gather  (cost=525399.69..525399.80 rows=1 width=8)
         Workers Planned: 1
         ->  Partial Aggregate  (cost=515399.69..515399.70 rows=1 width=8)
               ->  Merge Join  (cost=0.87..500693.81 rows=5882353 width=0)
                     Merge Cond: (t1.id = t2.id)
                     ->  Parallel Index Only Scan using idx_t on t t1  (cost=0.43..180493.96 rows=5882353 width=4)
                     ->  Index Only Scan using idx_t on t t2  (cost=0.43..221670.43 rows=10000000 width=4)
(8 rows)

Time: 1.441 ms

====== 换成hash join
lightdb@postgres=# set enable_mergejoin to off;
SET
Time: 0.681 ms
lightdb@postgres=# explain select count(1) from t t1,t t2 where t1.id=t2.id;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=644560.95..644560.96 rows=1 width=8)
   ->  Gather  (cost=644560.83..644560.94 rows=1 width=8)
         Workers Planned: 1
         ->  Partial Aggregate  (cost=634560.83..634560.84 rows=1 width=8)
               ->  Parallel Hash Join  (cost=277001.81..619854.95 rows=5882353 width=0)
                     Hash Cond: (t1.id = t2.id)
                     ->  Parallel Index Only Scan using idx_t on t t1  (cost=0.43..180493.96 rows=5882353 width=4)
                     ->  Parallel Hash  (cost=180493.96..180493.96 rows=5882353 width=4)
                           ->  Parallel Index Only Scan using idx_t on t t2  (cost=0.43..180493.96 rows=5882353 width=4)
(9 rows)

Time: 1.492 ms
lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 9897.255 ms (00:09.897)

无论是否添加唯一索引都差不多。如下:

lightdb@postgres=# alter table t add unique(id);
ALTER TABLE
Time: 6454.915 ms (00:06.455)
lightdb@postgres=# create index idx_t on t using hash(id);
CREATE INDEX
Time: 24519.773 ms (00:24.520)
lightdb@postgres=# select count(1) from t t1,t t2 where t1.id=t2.id;
  count   
----------
 10000000
(1 row)

Time: 10515.582 ms (00:10.516)

上面是int类型的情况,下面来看下字符串类型值相对比较长的情况。

-- hash索引
lightdb@postgres=# alter table t add column v varchar(100); ALTER TABLE Time: 1.614 ms lightdb@postgres=# update t set v = uuid(); UPDATE 10000000 Time: 74764.381 ms (01:14.764) lightdb@postgres=# drop index idx_v ; DROP INDEX Time: 358.055 ms lightdb@postgres=# create index idx_v on t using hash(v); CREATE INDEX Time: 26212.556 ms (00:26.213) lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1109415.94..1109416.05 rows=1 width=8) (actual time=16827.289..16843.024 rows=1 loops=1) Output: (count(1)) Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Aggregate (cost=1099415.94..1099415.95 rows=1 width=8) (actual time=16806.915..16806.986 rows=1 loops=1) Output: count(1) Worker 0: actual time=16806.915..16806.986 rows=1 loops=1 -> Hash Join (cost=396582.98..1074415.94 rows=9999999 width=0) (actual time=5346.963..16158.927 rows=10000000 loops=1) Hash Cond: ((t1.v)::text = (t2.v)::text) Worker 0: actual time=5346.963..16158.927 rows=10000000 loops=1 -> Seq Scan on public.t t1 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.208..2068.391 rows=10000000 loops=1) Output: t1.v Worker 0: actual time=0.208..2068.391 rows=10000000 loops=1 -> Hash (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=5334.495..5334.558 rows=10000000 loops=1) Output: t2.v Buckets: 65536 Batches: 256 Memory Usage: 3149kB Worker 0: actual time=5334.495..5334.558 rows=10000000 loops=1 -> Seq Scan on public.t t2 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.054..2165.095 rows=10000000 loops=1) Output: t2.v Worker 0: actual time=0.054..2165.095 rows=10000000 loops=1 Planning Time: 1.016 ms Execution Time: 16843.381 ms (23 rows) Time: 16845.926 ms (00:16.846) -- 因为内存小,batches交互多,所以增加work_mem消除I/O影响。 lightdb@postgres=# set work_mem to '128MB'; SET Time: 0.608 ms lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1109415.94..1109416.05 rows=1 width=8) (actual time=22352.372..22388.694 rows=1 loops=1) Output: (count(1)) Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Aggregate (cost=1099415.94..1099415.95 rows=1 width=8) (actual time=22335.328..22335.363 rows=1 loops=1) Output: count(1) Worker 0: actual time=22335.328..22335.363 rows=1 loops=1 -> Hash Join (cost=396582.98..1074415.94 rows=9999999 width=0) (actual time=5627.623..21684.815 rows=10000000 loops=1) Hash Cond: ((t1.v)::text = (t2.v)::text) Worker 0: actual time=5627.623..21684.815 rows=10000000 loops=1 -> Seq Scan on public.t t1 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.262..1901.155 rows=10000000 loops=1) Output: t1.v Worker 0: actual time=0.262..1901.155 rows=10000000 loops=1 -> Hash (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=5600.002..5600.028 rows=10000000 loops=1) Output: t2.v Buckets: 2097152 Batches: 8 Memory Usage: 100708kB Worker 0: actual time=5600.002..5600.028 rows=10000000 loops=1 -> Seq Scan on public.t t2 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.188..2076.519 rows=10000000 loops=1) Output: t2.v Worker 0: actual time=0.188..2076.519 rows=10000000 loops=1 Planning Time: 0.415 ms Execution Time: 22388.947 ms (23 rows) Time: 22390.522 ms (00:22.391) lightdb@postgres=# set work_mem to '1280MB'; SET Time: 0.651 ms lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=796915.94..796916.05 rows=1 width=8) (actual time=18028.588..18412.918 rows=1 loops=1) Output: (count(1)) Workers Planned: 1 Workers Launched: 1 Single Copy: true -> Aggregate (cost=786915.94..786915.95 rows=1 width=8) (actual time=18017.839..18017.861 rows=1 loops=1) Output: count(1) Worker 0: actual time=18017.839..18017.861 rows=1 loops=1 -> Hash Join (cost=318457.98..761915.94 rows=9999999 width=0) (actual time=7028.907..17394.653 rows=10000000 loops=1) Hash Cond: ((t1.v)::text = (t2.v)::text) Worker 0: actual time=7028.907..17394.653 rows=10000000 loops=1 -> Seq Scan on public.t t1 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.141..1507.013 rows=10000000 loops=1) Output: t1.id, t1.v Worker 0: actual time=0.141..1507.013 rows=10000000 loops=1 -> Hash (cost=193457.99..193457.99 rows=9999999 width=37) (actual time=6881.732..6881.746 rows=10000000 loops=1) Output: t2.v Buckets: 16777216 Batches: 1 Memory Usage: 804901kB Worker 0: actual time=6881.732..6881.746 rows=10000000 loops=1 -> Seq Scan on public.t t2 (cost=0.00..193457.99 rows=9999999 width=37) (actual time=0.128..1781.326 rows=10000000 loops=1) Output: t2.v Worker 0: actual time=0.128..1781.326 rows=10000000 loops=1 Planning Time: 0.327 ms Execution Time: 18413.120 ms (23 rows) Time: 18414.684 ms (00:18.415)
-- b树索引
lightdb@postgres=# drop index idx_v ;
DROP INDEX
Time: 133.610 ms
lightdb@postgres=# create index idx_v on t(v);
CREATE INDEX
Time: 99610.086 ms (01:39.610)
lightdb@postgres=# explain (analyze,verbose) select count(1) from t t1,t t2 where t1.v=t2.v;
                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=635535.74..635535.76 rows=1 width=8) (actual time=55059.701..55482.026 rows=1 loops=1)
   Output: count(1)
   ->  Gather  (cost=635535.53..635535.74 rows=2 width=8) (actual time=54912.085..55481.978 rows=3 loops=1)
         Output: (PARTIAL count(1))
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=625535.53..625535.54 rows=1 width=8) (actual time=54888.937..54888.960 rows=1 loops=3)
               Output: PARTIAL count(1)
               Worker 0:  actual time=54886.263..54886.305 rows=1 loops=1
               Worker 1:  actual time=54868.875..54868.890 rows=1 loops=1
               ->  Parallel Hash Join  (cost=309296.10..615118.86 rows=4166666 width=0) (actual time=26238.000..54499.870 rows=3333333 loops=3)
                     Hash Cond: ((t1.v)::text = (t2.v)::text)
                     Worker 0:  actual time=26224.367..54500.826 rows=3357480 loops=1
                     Worker 1:  actual time=26223.631..54484.862 rows=3362240 loops=1
                     ->  Parallel Index Only Scan using idx_v on public.t t1  (cost=0.56..257212.22 rows=4166666 width=37) (actual time=0.112..23401.164 rows=3333333 loops=3)
                           Output: t1.v
                           Heap Fetches: 10000000
                           Worker 0:  actual time=0.114..23476.367 rows=3357480 loops=1
                           Worker 1:  actual time=0.100..23249.643 rows=3362240 loops=1
                     ->  Parallel Hash  (cost=257212.22..257212.22 rows=4166666 width=37) (actual time=26176.344..26176.362 rows=3333333 loops=3)
                           Output: t2.v
                           Buckets: 16777216  Batches: 1  Memory Usage: 835968kB
                           Worker 0:  actual time=26223.984..26224.022 rows=3279920 loops=1
                           Worker 1:  actual time=26223.158..26223.169 rows=3353980 loops=1
                           ->  Parallel Index Only Scan using idx_v on public.t t2  (cost=0.56..257212.22 rows=4166666 width=37) (actual time=0.198..23040.507 rows=3333333 loops=3)
                                 Output: t2.v
                                 Heap Fetches: 10000000
                                 Worker 0:  actual time=0.136..23046.634 rows=3279920 loops=1
                                 Worker 1:  actual time=0.327..22978.942 rows=3353980 loops=1
 Planning Time: 0.559 ms
 Execution Time: 55482.322 ms
(31 rows)

Time: 55484.995 ms (00:55.485)

从上可知,在字符串下,hash索引相比btree索引性能高3倍以上。hash索引还是很有价值的。

 

标签:index,rows,hash,lightdb,..,cost,time,loops,actual
From: https://www.cnblogs.com/lightdb/p/18237588

相关文章

  • C++数据结构之:哈希表Hash
    摘要:  it人员无论是使用哪种高级语言开发东东,想要更高效有层次的开发程序的话都躲不开三件套:数据结构,算法和设计模式。数据结构是相互之间存在一种或多种特定关系的数据元素的集合,即带“结构”的数据元素的集合,“结构”就是指数据元素之间存在的关系,分为逻辑结构和存储......
  • msSQL 语句查询时 charindex()方法的妙用
    CHARINDEX(expressionToFind,expressionToSearch[,start_location])参数说明:expressionToFind:必需,要查找的字符串。expressionToSearch:必需,要进行查找的字符串表达式。start_location:可选,指定搜索的起始位置,默认为1。如果指定了起始位置,则从该位置开始搜索。list变......
  • java ArrayIndexOutOfBoundsException数组越界异常
    Java中的ArrayIndexOutOfBoundsException(数组越界异常)是一种运行时异常,表示访问了数组的非法索引位置。在数组中,索引从0开始,并以数组长度减一为上限。如果使用了小于0或大于等于数组长度的索引,就会抛出ArrayIndexOutOfBoundsException异常。以下是一个示例代码,演示了这个异常......
  • FFT&hash
    1.FFT常看常新啊,比如突然发现complex比手写快!注意实部和虚部的函数分别是real()和imag()#include<bits/stdc++.h>usingnamespacestd;#definerep(i,j,k)for(inti=(j);i<=(k);++i)#definedow(i,j,k)for(inti=(j);i>=(k);--i)#defineprpair#definepbpush_back#d......
  • 【图解】HashMap1.7 头插法造成死循环
    1.概述HashMap1.7当中,扩容的时候,采用的是头插法转移结点,在多线程并发的情况下会造成链表死循环的问题。HashMap1.8中改为了尾插法,解决扩容时线程并发产生的死循环问题。2.图解假设有两个线程,线程1和线程2,两个线程进行hashMap的put操作,触发了扩容。下面是扩容的时候结点转移的......
  • C# Parallel foreach Parallel Source array was not long enough. Check srcIndex an
    //Indexwasoutsidetheboundsofthearray.//Sourcearraywasnotlongenough.ChecksrcIndexandlength,andthearray'slowerbounds//usingSystem;usingSystem.Collections.Concurrent;usingSystem.Collections.Generic;usingSystem.Linq;usingSy......
  • 为什么MySQL的information_schema.tables字段index_length显示为0?
     为什么MySQL的information_schema.tables字段index_length显示为0? 测试版本为:MySQL社区版8.0.36 分情况1:innodb引擎的表是索引组织表,按照主键进行顺序存放。则表是索引,索引是表,index_length不会有值。如下,创建一个表,并追加主键,查看索引情况。(root@localhost11:02:2......
  • 【git commit错误】error: bad signature 0x00000000 fatal: index file corrupt原因
    解决Git错误:error:badsignature0x00000000fatal:indexfilecorrupt原因分析及解决方案在使用Git进行版本控制时,可能会遇到各种错误。其中之一是关于索引文件(通常为.git/index)损坏的错误,这会导致无法正常提交更改。基础知识Git索引:Git使用一个索引文件来跟踪工作目......
  • Redis之hash
    Redis哈希(Hash)hash的格式也是键值对key:map,只不过他的值是map集合。key:{key:vlaue}案例127.0.0.1:6379>HSETmyhashfield1lili#set一个具体的key-value(integer)1127.0.0.1:6379>HGETmyhashfield1"lili"127.0.0.1:6379>HSETmyhashf......
  • JS面试题:hash和history的区别
    一、hash模式和history模式的介绍由于Vue项目为单页面应用,所以整个项目在开发和构建过程中,仅存在一个HTML物理文件。通过路由系统可以实现将项目的组件与可访问的URL路径进行绑定。由于Vue项目只有一个HTML物理文件,切换页面时既需要让访问的URL路径发生变化,又不能触发H......