首页 > 其他分享 >exist和left join 性能对比

exist和left join 性能对比

时间:2023-09-18 16:26:26浏览次数:36  
标签:rows join family .. app a1 exist id left

今天遇到一个性能问题,再调优过程中发现耗时最久的计划是exist 部分涉及的三个表。

然后计划用left join 来替换exist,然后查询了很多资料,大部分都说exist和left join 性能差不多。 为了验证这一结论进行了如下实验

步骤如下

1、创建测试表

drop table app_family;
CREATE TABLE app_family (
"family_id" character varying(32 char) NOT NULL,
"application_id" character varying(32 char) NULL,
"family_number" character varying(50 char) ,
"household_register_number" character varying(50 char),
"poverty_reason" character varying(32 char),
CONSTRAINT "pk_app_family_idpk" PRIMARY KEY (family_id));
insert into app_family select generate_series(1,1000000),generate_series(1,1000000),'aaaa','aaa','bbb' from dual ;

create table app_family2 as select * from app_family;

create table app_memeber as select * from app_family;

2、验证两张表join和exist 性能对比

语句1、两张表exist

explain analyze select a1.application_id,a1.family_id from app_family a1 where
a1.family_id >1000 and
EXISTS(
SELECT
1
FROM
app_family2 a2
WHERE
a2.application_id=a1.application_id
and a2.family_id > 500000
)

总计用时404 ms

QUERY PLAN |
-------------------------------------------------------------------------------------------------------------------------------------------------+
Gather (cost=16384.11..42022.84 rows=111111 width=12) (actual time=216.416..383.339 rows=500000 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Hash Semi Join (cost=15384.11..29911.74 rows=46296 width=12) (actual time=211.871..328.103 rows=166667 loops=3) |
Hash Cond: ((a1.application_id)::text = (a2.application_id)::text) |
-> Parallel Seq Scan on app_family a1 (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.049..72.722 rows=333000 loops=3) |
Filter: ((family_id)::integer > 1000) |
Rows Removed by Filter: 333 |
-> Parallel Hash (cost=13648.00..13648.00 rows=138889 width=6) (actual time=116.903..116.903 rows=166667 loops=3) |
Buckets: 524288 Batches: 1 Memory Usage: 23712kB |
-> Parallel Seq Scan on app_family2 a2 (cost=0.00..13648.00 rows=138889 width=6) (actual time=35.008..71.846 rows=166667 loops=3)|
Filter: ((family_id)::integer > 500000) |
Rows Removed by Filter: 166667 |
Planning Time: 0.147 ms |
Execution Time: 404.901 ms | |

语句2 两张表join

explain analyze select a1.application_id,a1.family_id from app_family a1 LEFT JOIN app_family2 a2 ON a2.application_id=a1.application_id
WHERE a1.family_id >1000 AND a2.family_id > 500000

总计执行时间416ms

QUERY PLAN |
-------------------------------------------------------------------------------------------------------------------------------------------------+
Gather (cost=16384.11..41856.95 rows=111111 width=12) (actual time=222.545..393.519 rows=500000 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Hash Join (cost=15384.11..29745.85 rows=46296 width=12) (actual time=217.701..337.510 rows=166667 loops=3) |
Hash Cond: ((a1.application_id)::text = (a2.application_id)::text) |
-> Parallel Seq Scan on app_family a1 (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.060..68.735 rows=333000 loops=3) |
Filter: ((family_id)::integer > 1000) |
Rows Removed by Filter: 333 |
-> Parallel Hash (cost=13648.00..13648.00 rows=138889 width=6) (actual time=130.917..130.917 rows=166667 loops=3) |
Buckets: 524288 Batches: 1 Memory Usage: 23712kB |
-> Parallel Seq Scan on app_family2 a2 (cost=0.00..13648.00 rows=138889 width=6) (actual time=37.127..80.126 rows=166667 loops=3)|
Filter: ((family_id)::integer > 500000) |
Rows Removed by Filter: 166667 |
Planning Time: 0.146 ms |
Execution Time: 416.346 ms |

两张表场景总结

针对两张表的对比可以发现join还相对满了10几ms但是总的来说两边 差异不大。所以再两张表的关联情况下 join和exist 性能相近。

3、验证3张表join和exist 性能对比

语句1 三张表exist

本场景最开始执行时 exit 用户6s多,原因时用到了内存排序,后来调整了work_mem 排除了内存排序的影响,最终执行时间

2036 ms

explain analyze select a1.application_id,a1.family_id from app_family a1 ,app_family2 a2 where
a1.family_id >1000 and a2.family_id < 900000 and
EXISTS(
SELECT
1
FROM
app_memeber m
WHERE
m.application_id=a1.application_id
and m.family_id=a2.family_id
)

QUERY PLAN |
------------------------------------------------------------------------------------------------------------------------------------------------------+
Gather (cost=61282.11..88664.67 rows=111111 width=12) (actual time=1347.081..1976.824 rows=898999 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Hash Join (cost=60282.11..76553.57 rows=46296 width=12) (actual time=1386.451..1859.819 rows=299666 loops=3) |
Hash Cond: ((m.family_id)::text = (a2.family_id)::text) |
-> Hash Join (cost=44898.00..60455.72 rows=138889 width=18) (actual time=1186.741..1486.233 rows=333000 loops=3) |
Hash Cond: ((a1.application_id)::text = (m.application_id)::text) |
-> Parallel Seq Scan on app_family a1 (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.074..76.516 rows=333000 loops=3) |
Filter: ((family_id)::integer > 1000) |
Rows Removed by Filter: 333 |
-> Hash (cost=32398.00..32398.00 rows=1000000 width=12) (actual time=1181.785..1181.786 rows=1000000 loops=3) |
Buckets: 1048576 Batches: 1 Memory Usage: 52897kB |
-> HashAggregate (cost=22398.00..32398.00 rows=1000000 width=12) (actual time=618.475..910.329 rows=1000000 loops=3) |
Group Key: (m.application_id)::text, (m.family_id)::text |
-> Seq Scan on app_memeber m (cost=0.00..17398.00 rows=1000000 width=12) (actual time=0.070..158.845 rows=1000000 loops=3)|
-> Parallel Hash (cost=13648.00..13648.00 rows=138889 width=6) (actual time=198.377..198.377 rows=300000 loops=3) |
Buckets: 1048576 (originally 524288) Batches: 1 (originally 1) Memory Usage: 47552kB |
-> Parallel Seq Scan on app_family2 a2 (cost=0.00..13648.00 rows=138889 width=6) (actual time=0.098..72.933 rows=300000 loops=3) |
Filter: ((family_id)::integer < 900000) |
Rows Removed by Filter: 33334 |
Planning Time: 0.495 ms |
Execution Time: 2036.815 ms |

语句2 三张表join

为了保证语句的一致性,三张表的join顺序保持和语句1的执行计划中的顺序一致,join总计用时897ms

explain analyze select a1.application_id,a1.family_id from app_family a1
left join app_memeber m on a1.application_id = m.application_id LEFT JOIN app_family2 a2 ON m.family_id = a2.family_id
WHERE a1.family_id >1000 AND a2.family_id < 900000

QUERY PLAN |
------------------------------------------------------------------------------------------------------------------------------------------------------+
Gather (cost=31768.22..57298.93 rows=111111 width=12) (actual time=386.163..865.284 rows=898999 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
-> Parallel Hash Join (cost=30768.22..45187.83 rows=46296 width=12) (actual time=381.211..765.177 rows=299666 loops=3) |
Hash Cond: ((m.family_id)::text = (a2.family_id)::text) |
-> Parallel Hash Join (cost=15384.11..29089.98 rows=138889 width=18) (actual time=191.233..413.917 rows=333000 loops=3) |
Hash Cond: ((m.application_id)::text = (a1.application_id)::text) |
-> Parallel Seq Scan on app_memeber m (cost=0.00..11564.67 rows=416667 width=12) (actual time=0.025..33.077 rows=333333 loops=3) |
-> Parallel Hash (cost=13648.00..13648.00 rows=138889 width=12) (actual time=189.919..189.920 rows=333000 loops=3) |
Buckets: 1048576 (originally 524288) Batches: 1 (originally 1) Memory Usage: 59200kB |
-> Parallel Seq Scan on app_family a1 (cost=0.00..13648.00 rows=138889 width=12) (actual time=0.060..78.600 rows=333000 loops=3)|
Filter: ((family_id)::integer > 1000) |
Rows Removed by Filter: 333 |
-> Parallel Hash (cost=13648.00..13648.00 rows=138889 width=6) (actual time=188.522..188.523 rows=300000 loops=3) |
Buckets: 1048576 (originally 524288) Batches: 1 (originally 1) Memory Usage: 47584kB |
-> Parallel Seq Scan on app_family2 a2 (cost=0.00..13648.00 rows=138889 width=6) (actual time=0.052..81.396 rows=300000 loops=3) |
Filter: ((family_id)::integer < 900000) |
Rows Removed by Filter: 33334 |
Planning Time: 0.167 ms |
Execution Time: 897.552 ms |

总结三张表场景

在三张表的场景下exist用时2036 ms ,join用时897ms 可见 join的顺序明显优于exist。

在三张表的场景下可以看到,针对中间表appmember扫描时, exist语句用到HashAggregate 并做了 Group Key,所以导致exist 执行时间增加。如果work_mem 配置不合适时间会更长。

标签:rows,join,family,..,app,a1,exist,id,left
From: https://www.cnblogs.com/kingbase/p/17561117.html

相关文章

  • 软件测试|MySQL CROSS JOIN:交叉连接的详细解析
    简介在MySQL数据库中,CROSSJOIN是一种用于生成两个或多个表的笛卡尔积的连接方法。CROSSJOIN不需要任何连接条件,它将左表的每一行与右表的每一行进行组合,从而生成一个包含所有可能组合的结果集。本文将详细介绍MySQL中的CROSSJOIN概念,并提供示例来加深理解。什么是CROSS......
  • 软件测试|MySQL INNER JOIN:内连接的详细解析
    简介在关系型数据库中,数据通常存储在多个表中,并且这些表之间可能存在关联关系。为了从多个表中检索相关数据,SQL提供了不同类型的连接操作,其中之一就是内连接(INNERJOIN)。本文将详细解析MySQL内连接的概念、语法和使用案例。什么是内连接(INNERJOIN)?内连接是一种用于从两个或多个表中......
  • exist和left join 性能对比
    今天遇到一个性能问题,再调优过程中发现耗时最久的计划是exist部分涉及的三个表。然后计划用leftjoin来替换exist,然后查询了很多资料,大部分都说exist和leftjoin性能差不多。为了验证这一结论进行了如下实验步骤如下1、创建测试表droptableapp_family;CREATETABLEapp......
  • mysql update、delete left join 操作
    1delete删除多表DELETEq,pqfromcom_purchasequoteqLEFTJOINcom_purchasequote_snapshootpqonq.id=pq.idwhereq.id=84110;删除一张表DELETEqfromcom_purchasequoteqLEFTJOINcom_purchasequote_snapshootpqonq.id=pq.idwhereq.id=82347;DELETEpqfrom......
  • wait、sleep、yield、join的区别
    两个概念1、锁队列所有需要竞争同步锁的线程都会放在锁队列中,比如当前对象的锁已经被一个线程得到,则其他线程都需要在这个锁队列中进行等待,当前面的线程释放同步锁后,锁队列中的线程去竞争同步锁,当某个线程得到后会进入就绪队列进行等待CPU资源分配。2、等待队列(wait方法)在调用......
  • MySQL中exists和in的区别
    https://blog.csdn.net/m0_50370837/article/details/124239171?ops_request_misc=&request_id=&biz_id=102&utm_term=mysql%20IN%20%E5%92%8CEXISTS%20%E7%9A%84%E5%8C%BA%E5%88%AB&utm_medium=distribute.pc_search_result.none-task-blog-2~all~sobaiduweb......
  • 双边快排的基准点和先判断left还是right问题
     前同事问了我一个双边快排的算法,他问我怎么都无法正常排序,代码如下,publicstaticvoidmain(String[]args){int[]arr=newint[]{7,3,6,4,8,9,0,22,28,2,3,79,24};arr=newint[]{4,4,6,5,3,2,8,1};System.out.println("left:"+0+"right:"......
  • 对于数组中取下标中值操作int mid=(left+right)/2的讨论
    分两种情况1.left和right之间(含left和right元素)共有奇数个,此时中轴线穿过正中间的元素判断方法:right-left的值为偶数,即(right-left)%2=0。此时(left+right)/2恰为整数,此结果恰为left与right下标之间的中值下标,正好在中轴线上2.left......
  • 【git pull】 error: You have not concluded your merge (MERGE_HEAD exists).
    问题$gitpullerror:Youhavenotconcludedyourmerge(MERGE_HEADexists).......
  • 中值计算为什么一般用left+(right-left)/2而不是(right+left)/2
    left+(right-left)/2和(right+left)/2两个计算的结果是一样的,但是1、对于16位编译器,int占16位(2字节)。int的最大值为32767.2、对于32位和64位编译器,int占32位(4字节)。int的最大值为2147483647使用(right+left)/2当right+left的值超过int的最大值的时候就会溢出而轮不到/......