首页 > 其他分享 >exist和left join 性能对比

exist和left join 性能对比

时间:2023-09-18 15:22:30浏览次数:34  
标签:rows join family .. app a1 exist id left

今天遇到一个性能问题,再调优过程中发现耗时最久的计划是exist 部分涉及的三个表。

然后计划用left join 来替换exist,然后查询了很多资料,大部分都说exist和left join 性能差不多。 为了验证这一结论进行了如下实验

步骤如下

1、创建测试表

drop table app_family;
CREATE TABLE app_family (
"family_id" character varying(32 char) NOT NULL,
"application_id" character varying(32 char) NULL,
"family_number" character varying(50 char) ,
"household_register_number" character varying(50 char),
"poverty_reason" character varying(32 char),
CONSTRAINT "pk_app_family_idpk" PRIMARY KEY (family_id));
insert into app_family select generate_series(1,1000000),generate_series(1,1000000),'aaaa','aaa','bbb' from dual ;

create table app_family2 as select * from app_family;

create table app_memeber as select * from app_family;

2、验证两张表join和exist 性能对比

语句1、两张表exist

explain analyze select a1.application_id,a1.family_id from app_family a1 where
a1.family_id >1000 and
EXISTS(
SELECT
1
FROM
app_family2 a2
WHERE
a2.application_id=a1.application_id
and a2.family_id > 500000
)

总计用时646.203 ms

 ----------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=16927.11..44466.84 rows=111111 width=12) (actual time=354.314..621.714 rows=500000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Hash Semi Join  (cost=15927.11..32355.74 rows=46296 width=12) (actual time=355.657..512.049 rows=166667 loops=3)
         Hash Cond: ((a1.application_id)::text = (a2.application_id)::text)
         ->  Parallel Seq Scan on app_family a1  (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.222..111.618 rows=333000 loops=3)
               Filter: ((family_id)::integer > 1000)
               Rows Removed by Filter: 333
         ->  Parallel Hash  (cost=13648.00..13648.00 rows=138889 width=6) (actual time=149.203..149.204 rows=166667 loops=3)
               Buckets: 131072  Batches: 8  Memory Usage: 3520kB
               ->  Parallel Seq Scan on app_family2 a2  (cost=0.00..13648.00 rows=138889 width=6) (actual time=48.576..109.251 rows=166667 loops=3)
                     Filter: ((family_id)::integer > 500000)
                     Rows Removed by Filter: 166667
 Planning Time: 0.145 ms
 Execution Time: 645.095 ms
(15 rows)

Time: 646.203 ms
kingbase=#

语句2 两张表join

explain analyze select a1.application_id,a1.family_id from app_family a1 LEFT JOIN app_family2 a2 ON a2.application_id=a1.application_id
WHERE a1.family_id >1000 AND a2.family_id > 500000

总计执行时间624.211 ms

---------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=16927.11..44300.95 rows=111111 width=12) (actual time=349.752..601.304 rows=500000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Hash Join  (cost=15927.11..32189.85 rows=46296 width=12) (actual time=337.548..508.139 rows=166667 loops=3)
         Hash Cond: ((a1.application_id)::text = (a2.application_id)::text)
         ->  Parallel Seq Scan on app_family a1  (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.087..111.949 rows=333000 loops=3)
               Filter: ((family_id)::integer > 1000)
               Rows Removed by Filter: 333
         ->  Parallel Hash  (cost=13648.00..13648.00 rows=138889 width=6) (actual time=131.718..131.719 rows=166667 loops=3)
               Buckets: 131072  Batches: 8  Memory Usage: 3488kB
               ->  Parallel Seq Scan on app_family2 a2  (cost=0.00..13648.00 rows=138889 width=6) (actual time=31.730..90.917 rows=166667 loops=3)
                     Filter: ((family_id)::integer > 500000)
                     Rows Removed by Filter: 166667
 Planning Time: 0.093 ms
 Execution Time: 623.465 ms
(15 rows)

Time: 624.211 ms

两张表场景总结

针对两张表的对比可以发现join还相对满了10几ms但是总的来说两边 差异不大。所以再两张表的关联情况下 join和exist 性能相近。

3、验证3张表join和exist 性能对比

语句1 三张表exist

本场景最开始执行时 exit 用户6 s多,原因时用到了内存排序,后来调整了work_mem 排除了内存排序的影响,最终执行时间

2911.146 ms

explain analyze select a1.application_id,a1.family_id from app_family a1 ,app_family2 a2 where
a1.family_id >1000 and a2.family_id < 900000 and
EXISTS(
SELECT
1
FROM
app_memeber m
WHERE
m.application_id=a1.application_id
and m.family_id=a2.family_id
)

------------------------------------------------------------------------------------------------------------------------------------------------------
--
 Gather  (cost=61282.11..88664.67 rows=111111 width=12) (actual time=2112.079..2847.233 rows=898999 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Hash Join  (cost=60282.11..76553.57 rows=46296 width=12) (actual time=2119.345..2705.935 rows=299666 loops=3)
         Hash Cond: ((m.family_id)::text = (a2.family_id)::text)
         ->  Hash Join  (cost=44898.00..60455.72 rows=138889 width=18) (actual time=1885.923..2264.850 rows=333000 loops=3)
               Hash Cond: ((a1.application_id)::text = (m.application_id)::text)
               ->  Parallel Seq Scan on app_family a1  (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.091..109.196 rows=333000 loops=3)
                     Filter: ((family_id)::integer > 1000)
                     Rows Removed by Filter: 333
               ->  Hash  (cost=32398.00..32398.00 rows=1000000 width=12) (actual time=1880.027..1880.028 rows=1000000 loops=3)
                     Buckets: 1048576  Batches: 1  Memory Usage: 52897kB
                     ->  HashAggregate  (cost=22398.00..32398.00 rows=1000000 width=12) (actual time=957.973..1382.683 rows=1000000 loops=3)
                           Group Key: (m.application_id)::text, (m.family_id)::text
                           ->  Seq Scan on app_memeber m  (cost=0.00..17398.00 rows=1000000 width=12) (actual time=0.047..247.902 rows=1000000 loops=3
)
         ->  Parallel Hash  (cost=13648.00..13648.00 rows=138889 width=6) (actual time=231.705..231.706 rows=300000 loops=3)
               Buckets: 1048576 (originally 524288)  Batches: 1 (originally 1)  Memory Usage: 47552kB
               ->  Parallel Seq Scan on app_family2 a2  (cost=0.00..13648.00 rows=138889 width=6) (actual time=0.039..100.756 rows=300000 loops=3)
                     Filter: ((family_id)::integer < 900000)
                     Rows Removed by Filter: 33334
 Planning Time: 0.359 ms
 Execution Time: 2911.146 ms
(22 rows)

语句2 三张表join

为了保证语句的一致性,三张表的join顺序保持和语句1的执行计划中的顺序一致,join总计用时1476.651 ms

explain analyze select a1.application_id,a1.family_id from app_family a1
left join app_memeber m on a1.application_id = m.application_id LEFT JOIN app_family2 a2 ON m.family_id = a2.family_id
WHERE a1.family_id >1000 AND a2.family_id < 900000

 Gather  (cost=32990.22..64898.93 rows=111111 width=12) (actual time=993.681..1436.895 rows=898999 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Hash Join  (cost=31990.22..52787.83 rows=46296 width=12) (actual time=982.512..1241.385 rows=299666 loops=3)
         Hash Cond: ((m.application_id)::text = (a1.application_id)::text)
         ->  Parallel Hash Join  (cost=15927.11..34245.98 rows=138889 width=6) (actual time=377.411..635.945 rows=300000 loops=3)
               Hash Cond: ((m.family_id)::text = (a2.family_id)::text)
               ->  Parallel Seq Scan on app_memeber m  (cost=0.00..11564.67 rows=416667 width=12) (actual time=0.034..59.470 rows=333333 loops=3)
               ->  Parallel Hash  (cost=13648.00..13648.00 rows=138889 width=6) (actual time=232.286..232.287 rows=300000 loops=3)
                     Buckets: 131072 (originally 131072)  Batches: 16 (originally 8)  Memory Usage: 3296kB
                     ->  Parallel Seq Scan on app_family2 a2  (cost=0.00..13648.00 rows=138889 width=6) (actual time=0.030..104.370 rows=300000 loops=
3)
                           Filter: ((family_id)::integer < 900000)
                           Rows Removed by Filter: 33334
         ->  Parallel Hash  (cost=13648.00..13648.00 rows=138889 width=12) (actual time=271.185..271.185 rows=333000 loops=3)
               Buckets: 131072 (originally 131072)  Batches: 16 (originally 8)  Memory Usage: 4032kB
               ->  Parallel Seq Scan on app_family a1  (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.091..129.188 rows=333000 loops=3)
                     Filter: ((family_id)::integer > 1000)
                     Rows Removed by Filter: 333
 Planning Time: 0.140 ms
 Execution Time: 1475.305 ms
(20 rows)

Time: 1476.651 ms (00:01.477)

总结三张表场景

在三张表的场景下exist用时2911.146 ms ,join用时1476.651 ms 可见 join的顺序明显优于exist。

在三张表的场景下可以看到,针对中间表appmember扫描时, exist语句用到HashAggregate 并做了 Group Key,所以导致exist 执行时间增加。如果work_mem 配置不合适时间会更长。

标签:rows,join,family,..,app,a1,exist,id,left
From: https://www.cnblogs.com/kingbase/p/17711921.html

相关文章

  • mysql update、delete left join 操作
    1delete删除多表DELETEq,pqfromcom_purchasequoteqLEFTJOINcom_purchasequote_snapshootpqonq.id=pq.idwhereq.id=84110;删除一张表DELETEqfromcom_purchasequoteqLEFTJOINcom_purchasequote_snapshootpqonq.id=pq.idwhereq.id=82347;DELETEpqfrom......
  • wait、sleep、yield、join的区别
    两个概念1、锁队列所有需要竞争同步锁的线程都会放在锁队列中,比如当前对象的锁已经被一个线程得到,则其他线程都需要在这个锁队列中进行等待,当前面的线程释放同步锁后,锁队列中的线程去竞争同步锁,当某个线程得到后会进入就绪队列进行等待CPU资源分配。2、等待队列(wait方法)在调用......
  • MySQL中exists和in的区别
    https://blog.csdn.net/m0_50370837/article/details/124239171?ops_request_misc=&request_id=&biz_id=102&utm_term=mysql%20IN%20%E5%92%8CEXISTS%20%E7%9A%84%E5%8C%BA%E5%88%AB&utm_medium=distribute.pc_search_result.none-task-blog-2~all~sobaiduweb......
  • 双边快排的基准点和先判断left还是right问题
     前同事问了我一个双边快排的算法,他问我怎么都无法正常排序,代码如下,publicstaticvoidmain(String[]args){int[]arr=newint[]{7,3,6,4,8,9,0,22,28,2,3,79,24};arr=newint[]{4,4,6,5,3,2,8,1};System.out.println("left:"+0+"right:"......
  • 对于数组中取下标中值操作int mid=(left+right)/2的讨论
    分两种情况1.left和right之间(含left和right元素)共有奇数个,此时中轴线穿过正中间的元素判断方法:right-left的值为偶数,即(right-left)%2=0。此时(left+right)/2恰为整数,此结果恰为left与right下标之间的中值下标,正好在中轴线上2.left......
  • 【git pull】 error: You have not concluded your merge (MERGE_HEAD exists).
    问题$gitpullerror:Youhavenotconcludedyourmerge(MERGE_HEADexists).......
  • 中值计算为什么一般用left+(right-left)/2而不是(right+left)/2
    left+(right-left)/2和(right+left)/2两个计算的结果是一样的,但是1、对于16位编译器,int占16位(2字节)。int的最大值为32767.2、对于32位和64位编译器,int占32位(4字节)。int的最大值为2147483647使用(right+left)/2当right+left的值超过int的最大值的时候就会溢出而轮不到/......
  • os.path.join可以自动处理不合理的地址拼接
    结论:os.path.join比'\\'.join([])智能的地方在于os.path.join可以自动处理不合理的地址拼接截图:......
  • Java多线程____线程状态Join()方法
    代码:packagecom.test.thread;publicclassTestRunnableimplementsRunnable{ @Override publicvoidrun(){ synchronized(this){ for(inti=0;i<10;i++){ System.out.println(Thread.currentThread().getName()); } } } }packagecom......
  • 25进程/join方法/互斥锁
    代码创建进程"""创建进程的方式有哪些1.鼠标双击桌面一个应用图标2.代码创建创建进程的本质:在内存中申请一块内存空间用于运行相应的程序代码"""#第一种创建进程的方式》》对象#frommultiprocessingimportProcess#importtime###deftask(name)......