背景
在查询语句中,如果 Select 子句中,调用较为耗时的函数或子查询,比较各种SQL语句的编写方案,得出其中最优选择。
一、数据准备,SQL 语句
- 模拟较耗时的用户函数或子查询
确保执行子查询的时长是1秒。
create or replace function f001() returns int stable language sql as $$ select 1 from pg_sleep(1); $$;
- 模拟返回多行数据的子查询
结果集中,关联条件列含有重复值。
test=# create table t1 as select sn, id from (select generate_series(1, 3) sn), (select generate_series(3, 8) id); SELECT 18 test=# create table t2 as select generate_series(1, 5) id; SELECT 5 test=# select * from t1; sn | id ----+---- 1 | 3 2 | 3 3 | 3 1 | 4 2 | 4 3 | 4 1 | 5 2 | 5 3 | 5 1 | 6 2 | 6 3 | 6 1 | 7 2 | 7 3 | 7 1 | 8 2 | 8 3 | 8 (18 rows) test=# select * from t2; id ---- 1 2 3 4 5 (5 rows)
二、查询全体数据的优化方案
1、初始SQL
with a as (select sn, id from t1 ) select a.*, (select id + f001() sq_sum from t2 b where b.id = a.id) from a where 1 = 1; sn | id | sq_sum ----+----+-------- 1 | 3 | 4 2 | 3 | 4 3 | 3 | 4 1 | 4 | 5 2 | 4 | 5 3 | 4 | 5 1 | 5 | 6 2 | 5 | 6 3 | 5 | 6 1 | 6 | 2 | 6 | 3 | 6 | 1 | 7 | 2 | 7 | 3 | 7 | 1 | 8 | 2 | 8 | 3 | 8 | QUERY PLAN ------------------------------------------------------------------------------------------------------------- Seq Scan on t1 (cost=0.00..101806.05 rows=2260 width=12) (actual time=1000.638..9008.584 rows=18 loops=1) SubPlan 1 -> Seq Scan on t2 b (cost=0.00..45.03 rows=13 width=4) (actual time=500.465..500.468 rows=0 loops=18) Filter: (id = t1.id) Rows Removed by Filter: 4 Planning Time: 0.149 ms Execution Time: 9008.667 ms (7 rows)
2、CTE
使用临时表的结果进行连接,避免循环。注意,这里CTE 有materilaized 选项,主要是把耗时的部分先执行出结果,避免与其他部分查询合并,引发多次执行。
with a as (select sn, id from t1), b as materialized (select id, id + f001() sq_sum from t2 ) select a.*, (select sq_sum from b where b.id = a.id) from a where 1 = 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Seq Scan on t1 (cost=676.75..129868.35 rows=2260 width=12) (actual time=5004.377..5004.392 rows=18 loops=1) CTE b -> Seq Scan on t2 (cost=0.00..676.75 rows=2540 width=8) (actual time=1000.562..5004.348 rows=5 loops=1) SubPlan 2 -> CTE Scan on b (cost=0.00..57.15 rows=13 width=4) (actual time=166.821..278.021 rows=0 loops=18) Filter: (id = t1.id) Rows Removed by Filter: 4 Planning Time: 0.107 ms Execution Time: 5004.441 ms (9 rows)
3、LEFT JOIN 子查询
与上例一样,还是利用materialize 特性,避免了函数的多次调用。
with a as (select sn, id from t1) select a.*, b.sq_sum from a left join (select id, id + f001() sq_sum from t2 ) b on b.id = a.id where 1 = 1 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=0.00..86821.70 rows=28702 width=12) (actual time=3002.905..5005.627 rows=18 loops=1) Join Filter: (t2.id = t1.id) Rows Removed by Join Filter: 81 -> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.007..0.011 rows=18 loops=1) -> Materialize (cost=0.00..689.45 rows=2540 width=8) (actual time=55.621..278.088 rows=5 loops=18) -> Seq Scan on t2 (cost=0.00..676.75 rows=2540 width=8) (actual time=1001.176..5005.568 rows=5 loops=1) Planning Time: 0.180 ms Execution Time: 5005.650 ms (8 rows)
三、查询局部数据(过滤主表的条件)的优化方案
1、初始SQL
with a as (select sn, id from t1) select a.*, (select id + f001() sq_sum from t2 b where b.id = a.id) from a where 1 = 1 and a.id = 3; sn | id | sq_sum ----+----+-------- 1 | 3 | 4 2 | 3 | 4 3 | 3 | 4 (3 rows) QUERY PLAN -------------------------------------------------------------------------------------------------------------- Seq Scan on t1 (cost=0.00..533.61 rows=11 width=12) (actual time=1000.356..3002.635 rows=3 loops=1) Filter: (id = 3) Rows Removed by Filter: 15 SubPlan 1 -> Seq Scan on t2 b (cost=0.00..45.03 rows=13 width=4) (actual time=1000.859..1000.866 rows=1 loops=3) Filter: (id = t1.id) Rows Removed by Filter: 4 Planning Time: 0.164 ms Execution Time: 3002.657 ms (9 rows)
2、CTE
with a as (select sn, id from t1), b as materialized (select id, id + f001() sq_sum from t2) select a.*, (select sq_sum from b where b.id = a.id) sq_sum from a where 1 = 1 and a.id=3 ; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Seq Scan on t1 (cost=676.75..1343.65 rows=11 width=12) (actual time=5004.958..5004.965 rows=3 loops=1) Filter: (id = 3) Rows Removed by Filter: 15 CTE b -> Seq Scan on t2 (cost=0.00..676.75 rows=2540 width=8) (actual time=1001.387..5004.930 rows=5 loops=1) SubPlan 2 -> CTE Scan on b (cost=0.00..57.15 rows=13 width=4) (actual time=1001.132..1668.316 rows=1 loops=3) Filter: (id = t1.id) Rows Removed by Filter: 4 Planning Time: 0.105 ms Execution Time: 5004.983 ms (11 rows)
3、LEFT JOIN 子查询
explain analyse with a as (select sn, id from t1) select a.*, b.sq_sum from a left join (select id, id + f001() sq_sum from t2) b on b.id = a.id where 1 = 1 and a.id = 3 ; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=0.00..85.46 rows=143 width=12) (actual time=1000.972..1000.980 rows=3 loops=1) Join Filter: (t2.id = t1.id) -> Seq Scan on t1 (cost=0.00..38.25 rows=11 width=8) (actual time=0.008..0.009 rows=3 loops=1) Filter: (id = 3) Rows Removed by Filter: 15 -> Materialize (cost=0.00..45.10 rows=13 width=8) (actual time=333.654..333.656 rows=1 loops=3) -> Seq Scan on t2 (cost=0.00..45.03 rows=13 width=8) (actual time=1000.959..1000.964 rows=1 loops=1) Filter: (id = 3) Rows Removed by Filter: 4 Planning Time: 0.107 ms Execution Time: 1000.996 ms (11 rows)
四、查询局部数据(过滤从表的条件)的优化方案
1、初始SQL
with a as (select sn, id from (select generate_series(1, 3) sn), (select generate_series(3, 8) id)) select a.*, (select id + f001() sq_sum from (select generate_series(1, 5) id) b where b.id = a.id) sq_sum from a where 1 = 1 and sq_sum = 6 ;
sn | id | sq_sum ----+----+-------- 1 | 5 | 6 2 | 5 | 6 3 | 5 | 6 (3 行记录) 时间:6007.526 ms (00:06.008) QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..3.43 rows=3 width=12) (actual time=4004.455..6006.681 rows=3 loops=1) -> Subquery Scan on "SYSINTERNAL-4-1" (cost=0.00..2.27 rows=1 width=4) (actual time=3003.273..3003.289 rows=1 loops=1) Filter: ((SubPlan 2) = 6) Rows Removed by Filter: 5 -> ProjectSet (cost=0.00..0.05 rows=6 width=4) (actual time=0.001..0.005 rows=6 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1) SubPlan 2 -> Subquery Scan on b_1 (cost=0.00..0.36 rows=1 width=4) (actual time=500.541..500.544 rows=0 loops=6) Filter: (b_1.id = "SYSINTERNAL-4-1".id) Rows Removed by Filter: 4 -> ProjectSet (cost=0.00..0.04 rows=5 width=4) (actual time=0.001..0.003 rows=5 loops=6) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=6) -> ProjectSet (cost=0.00..0.03 rows=3 width=4) (actual time=0.002..0.006 rows=3 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1) SubPlan 1 -> Subquery Scan on b (cost=0.00..0.36 rows=1 width=4) (actual time=1001.116..1001.122 rows=1 loops=3) Filter: (b.id = "SYSINTERNAL-4-1".id) Rows Removed by Filter: 4 -> ProjectSet (cost=0.00..0.04 rows=5 width=4) (actual time=0.001..0.006 rows=5 loops=3) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=3) Planning Time: 0.121 ms Execution Time: 6006.709 ms (22 行记录) 时间:6007.303 ms (00:06.007)
2、CTE
explain analyse with a as (select sn, id from (select generate_series(1, 3) sn), (select generate_series(3, 8) id)), b as (select id, id + f001() sq_sum from generate_series(1, 5) id) select a.*, (select sq_sum from b where b.id = a.id) sq_sum from a where 1 = 1 and sq_sum = 6 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=1.31..2.54 rows=3 width=12) (actual time=5005.414..5005.422 rows=3 loops=1) CTE b -> Function Scan on generate_series id (cost=0.00..1.31 rows=5 width=8) (actual time=1001.093..5005.376 rows=5 loops=1) -> Subquery Scan on "SYSINTERNAL-4-1" (cost=0.00..0.80 rows=1 width=4) (actual time=5005.406..5005.410 rows=1 loops=1) Filter: ((SubPlan 3) = 6) Rows Removed by Filter: 5 -> ProjectSet (cost=0.00..0.05 rows=6 width=4) (actual time=0.002..0.005 rows=6 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1) SubPlan 3 -> CTE Scan on b b_1 (cost=0.00..0.11 rows=1 width=4) (actual time=500.540..834.232 rows=0 loops=6) Filter: (id = "SYSINTERNAL-4-1".id) Rows Removed by Filter: 4 -> ProjectSet (cost=0.00..0.03 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1) SubPlan 2 -> CTE Scan on b (cost=0.00..0.11 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=3) Filter: (id = "SYSINTERNAL-4-1".id) Rows Removed by Filter: 4 Planning Time: 0.143 ms Execution Time: 5005.449 ms (20 行记录) 时间:5006.115 ms (00:05.006)
3、LEFT JOIN
explain analyse with a as (select sn, id from (select generate_series(1, 3) sn), (select generate_series(3, 8) id)) select a.*, b.sq_sum from a left join (select id, id + f001() sq_sum from generate_series(1, 5) id) b on b.id = a.id where 1 = 1 and sq_sum = 6 ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=1.39..1.62 rows=3 width=12) (actual time=5005.285..5005.291 rows=3 loops=1) -> Hash Join (cost=1.39..1.53 rows=1 width=8) (actual time=5005.281..5005.285 rows=1 loops=1) Hash Cond: ((generate_series(3, 8)) = b.id) -> ProjectSet (cost=0.00..0.05 rows=6 width=4) (actual time=0.002..0.005 rows=6 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1) -> Hash (cost=1.38..1.38 rows=1 width=8) (actual time=5005.270..5005.270 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Subquery Scan on b (cost=0.00..1.38 rows=1 width=8) (actual time=5005.264..5005.266 rows=1 loops=1) Filter: (b.sq_sum = 6) Rows Removed by Filter: 4 -> Function Scan on generate_series id (cost=0.00..1.31 rows=5 width=8) (actual time=1001.089..5005.254 rows=5 loops=1) -> ProjectSet (cost=0.00..0.03 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1) Planning Time: 0.113 ms Execution Time: 5005.320 ms (15 行记录) 时间:5005.873 ms (00:05.006)
4、LATERAL 连接 CTE
explain analyse with a as (select sn, id from (select generate_series(1, 3) sn), (select generate_series(3, 8) id)), b as (select id, id + f001() sq_sum from generate_series(1, 5) id) select a.*, b.sq_sum from a join lateral (select * from b where b.id = a.id ) b on true where 1 = 1 and sq_sum = 6; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=1.44..1.67 rows=3 width=12) (actual time=5005.492..5005.499 rows=3 loops=1) CTE b -> Function Scan on generate_series id (cost=0.00..1.31 rows=5 width=8) (actual time=1001.103..5005.440 rows=5 loops=1) -> Hash Join (cost=0.12..0.27 rows=1 width=8) (actual time=5005.485..5005.489 rows=1 loops=1) Hash Cond: ((generate_series(3, 8)) = b.id) -> ProjectSet (cost=0.00..0.05 rows=6 width=4) (actual time=0.002..0.006 rows=6 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1) -> Hash (cost=0.11..0.11 rows=1 width=8) (actual time=5005.472..5005.472 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> CTE Scan on b (cost=0.00..0.11 rows=1 width=8) (actual time=5005.457..5005.459 rows=1 loops=1) Filter: (sq_sum = 6) Rows Removed by Filter: 4 -> ProjectSet (cost=0.00..0.03 rows=3 width=4) (actual time=0.004..0.005 rows=3 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1) Planning Time: 0.205 ms Execution Time: 5005.531 ms (16 行记录) 时间:5006.765 ms (00:05.007)
5、LATERAL 连接子查询
explain analyse with a as (select sn, id from (select generate_series(1, 3) sn), (select generate_series(3, 8) id)) select a.*, b.sq_sum from a left join lateral (select id, id + f001() sq_sum from generate_series(1, 5) id where id = a.id ) b on true where 1 = 1 and sq_sum = 6 ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..2.44 rows=18 width=12) (actual time=3003.267..3003.280 rows=3 loops=1) -> Nested Loop (cost=0.00..2.15 rows=6 width=8) (actual time=3003.257..3003.268 rows=1 loops=1) -> ProjectSet (cost=0.00..0.05 rows=6 width=4) (actual time=0.002..0.007 rows=6 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.002 rows=1 loops=1) -> Subquery Scan on b (cost=0.00..0.33 rows=1 width=4) (actual time=500.541..500.542 rows=0 loops=6) Filter: (b.sq_sum = 6) Rows Removed by Filter: 0 -> Function Scan on generate_series id (cost=0.00..0.32 rows=1 width=8) (actual time=500.539..500.540 rows=0 loops=6) Filter: (id = (generate_series(3, 8))) Rows Removed by Filter: 4 -> Materialize (cost=0.00..0.08 rows=3 width=4) (actual time=0.007..0.009 rows=3 loops=1) -> ProjectSet (cost=0.00..0.03 rows=3 width=4) (actual time=0.005..0.006 rows=3 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1) Planning Time: 0.125 ms Execution Time: 3003.306 ms (15 行记录) 时间:3003.950 ms (00:03.004)
总结
- Select 子句中,表达式会逐行运算,总时长与结果集成正比。
- CTE子句,先计算全部结果,然后关联主表,总时长是可控。
- 使用子查询和 LATERAL 连接,可以避免结果集的重复计算。
标签:调用,cost,..,耗时,rows,SQL,loops,actual,id From: https://www.cnblogs.com/kingbase/p/16041197.html