首页 > 其他分享 >Semi-Join Subquery优化策略

Semi-Join Subquery优化策略

时间:2023-05-06 18:11:07浏览次数:54  
标签:City Join Semi Country PRIMARY 查询 Subquery select Population

Semi-Join Subquery优化策略

Semi-Join Subquery(半连接子查询):对应IN或EXISTS子查询,仅需要检查"外表记录"在"子查询结果集"中是否存在匹配记录,不需要计算"子查询结果集"中记录匹配次数,也不需要返回"子查询结果集"中匹配记录内容

在MariaDB(MySQL)中,常用优化Semi-Join(半连接)的策略有:

  • First Match
  • Table Pullout
  • Semi-join Materialization
  • Loose Scan
  • Duplicate Weedout

First Match策略

当循环"外部查询结果集"的每条记录去"子查询中"确认"是否匹配"时,只需要找到第一条匹配记录(First Match)既可跳出子查询。

如下面查询:

SELECT * FROM Country 
WHERE Country.code IN (
    SELECT City.Country 
    FROM City 
    WHERE City.Population > 1*1000*1000
)
AND Country.continent='Europe'

如果不使用First Match策略,当处理到Country表上满足" Country.continent='Europe' "条件的德国(Deu)记录时,会扫描City表上满足" City.Population > 110001000 AND City.Country='DEU' "的所有记录,再根据匹配记录总数返回"是否匹配"结果:

如果使用First Match策略,当处理到Country表上满足" Country.continent='Europe' "条件的德国(Deu)记录时,会扫描City表上满足" City.Population > 110001000 AND City.Country='DEU' "的第一条记录"Berlin"后,立即返回"是否匹配"结果:

在MariaDB上使用First Match策略的查询的执行计划为:

MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where City.Population > 1*1000*1000) and Country.continent='Europe';
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
| id | select_type | table   | type | possible_keys      | key       | key_len | ref                | rows | Extra                            |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
|  1 | PRIMARY     | Country | ref  | PRIMARY,continent  | continent | 17      | const              |   60 | Using index condition            |
|  1 | PRIMARY     | City    | ref  | Population,Country | Country   | 3       | world.Country.Code |   18 | Using where; FirstMatch(Country) |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
2 rows in set (0.00 sec)

MariaDB的执行计划中会有明显的FirstMatch标识。

在MySQL上使用First Match策略的查询的执行计划为:

MySQL [world]> explain select * from Country  where Country.code IN (select City.Country from City where City.Population > 1*1000*1000) and Country.continent='Europe';
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
| id | select_type        | table   | type           | possible_keys      | key       | key_len | ref   | rows | Extra                              |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
|  1 | PRIMARY            | Country | ref            | continent          | continent | 17      | const |   60 | Using index condition; Using where |
|  2 | DEPENDENT SUBQUERY | City    | index_subquery | Population,Country | Country   | 3       | func  |   18 | Using where                        |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
2 rows in set (0.01 sec)

MariaDB的执行计划中仅显示为依赖子查询(DEPENDENT SUBQUERY)

First Match策略和将IN子查询转换为EXISTS依赖子查询很相似,但两者还是存在明显差异,并非所有EXISTS操作都能使用First Match策略,如子查询中使用GROUP BY相关的聚合函数时,需要先完成GROUP BY操作才能确认"是否匹配"。

Table Pullout策略

当子查询的查询列表项只有主键或唯一索引键时,能推算出"子查询结果集"不存在重复记录,因此可以将子查询改为关联查询,即将子查询中的表上提到关联查询。

对于查询:

SELECT *
FROM City 
WHERE City.Country IN (
	SELECT Country.Code
	FROM Country 
	WHERE Country.Population < 100*1000
);

在MariaDB 5.2 和MySQL 5.6版本及之前版本上,执行计划为:

MySQL [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000);
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
| id | select_type        | table   | type            | possible_keys      | key     | key_len | ref  | rows | Extra       |
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
|  1 | PRIMARY            | City    | ALL             | NULL               | NULL    | NULL    | NULL | 4079 | Using where |
|  2 | DEPENDENT SUBQUERY | Country | unique_subquery | PRIMARY,Population | PRIMARY | 3       | func |    1 | Using where |
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
2 rows in set (0.00 sec)

如果Country.Code是主键或唯一索引,则可以将SQL改写为:

SELECT City.* 
FROM City
INNER JOIN Country 
ON City.Country=Country.Code
WHERE Country.Population < 100*1000;

改为关联查询后,可以根据两张关联表的统计数据来选择驱动表和被驱动表,因此在MariaDB 5.3或MySQL 5.7版本,执行计划为:

MariaDB [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000);
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type | table   | type  | possible_keys      | key        | key_len | ref                | rows | Extra                 |
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
|  1 | PRIMARY     | Country | range | PRIMARY,Population | Population | 4       | NULL               |   37 | Using index condition |
|  1 | PRIMARY     | City    | ref   | Country            | Country    | 3       | world.Country.Code |   18 |                       |
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
2 rows in set (0.00 sec)

Materialization策略

在使用Table Pullout策略时,需要能明确推算出"子查询结果集"不存在重复记录时才能将"子查询"改为"关联查询",如果将"子查询结果集"通过临时表去重固化后消除重复记录,则可以将子查询转换为"关联查询",即Materialization策略。

如对于查询:

SELECT * FROM Country 
WHERE Country.code IN (
    SELECT City.Country 
    FROM City 
    WHERE City.Population > 1*1000*1000
)
AND Country.continent='Europe'

在转换为"关联查询"后,按照"关联查询"中临时表是否为"驱动表"可以将Semi-join Materialization策略细分为:

  • Materialization/scan 策略,将临时表作为"驱动表",遍历临时表中每条记录去另外关联表中查找匹配记录。
  • Materialization/lookup 策略,将临时表作为"被驱动表",遍历另外的关联表在临时表中查询匹配记录。

使用Materialization/scan 策略时,MariaDB 查询计划为:

MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where  City.Population > 7*1000*1000);
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type  | table       | type   | possible_keys      | key        | key_len | ref                | rows | Extra                 |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
|  1 | PRIMARY      | <subquery2> | ALL    | distinct_key       | NULL       | NULL    | NULL               |   15 |                       |
|  1 | PRIMARY      | Country     | eq_ref | PRIMARY            | PRIMARY    | 3       | world.City.Country |    1 |                       |
|  2 | MATERIALIZED | City        | range  | Population,Country | Population | 4       | NULL               |   15 | Using index condition |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
3 rows in set (0.01 sec)

使用Materialization/lookup 策略时,MariaDB 查询计划为:

MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where  City.Population > 1*1000*1000) ;
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
| id | select_type  | table       | type   | possible_keys      | key          | key_len | ref  | rows | Extra                 |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
|  1 | PRIMARY      | Country     | ALL    | PRIMARY            | NULL         | NULL    | NULL |  239 |                       |
|  1 | PRIMARY      | <subquery2> | eq_ref | distinct_key       | distinct_key | 3       | func |    1 |                       |
|  2 | MATERIALIZED | City        | range  | Population,Country | Population   | 4       | NULL |  238 | Using index condition |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
3 rows in set (0.00 sec)

Loose Scan策略

在Materialization/scan 策略时,需要先将"子查询结果集"移除重复记录并固化到临时表,再作为驱动表进行关联查询。MySQL特性Index Loose Scan能在一次扫描中得跳过重复索引键得到"没有重复记录的临时结果集",Loose Scan策略基于Index Loose Scan特性保证关联查询不会出现"重复关联问题"。

如对于查询:

SELECT * FROM Country  
WHERE Country.code IN (
    SELECT country_code FROM Satellite
)

如果Satellite.country_code 存在索引,基于Index Loose Scan特性则能快速获得"SELECT DISTINCT country_code FROM Satellite"的效果,如图所示:

使用Loose Scan 策略时,MariaDB 查询计划为:

MariaDB [world]> explain select * from Country where Country.code in (select country_code from Satellite);
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
| id | select_type | table     | type   | possible_keys | key          | key_len | ref                          | rows | Extra                               |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
|  1 | PRIMARY     | Satellite | index  | country_code  | country_code | 9       | NULL                         |  932 | Using where; Using index; LooseScan |
|  1 | PRIMARY     | Country   | eq_ref | PRIMARY       | PRIMARY      | 3       | world.Satellite.country_code |    1 | Using index condition               |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+

Loose Scan 策略和Materialization/scan 策略区别:

  • Materialization/scan 策略:先将子查询的查询结果固化去重后,再作为驱动表与外部表进行关联查询,查询使用到临时表。
  • Loose Scan 策略:在对子查询的表进行Index Loose Scan操作过程中,直接将遍历到的记录与与外部表进行关联查询,查询未使用到临时表。

Duplicate Weedout策略

当无法根据表结构信息推算出"子查询结果集"不存在重复记录时,如果将子查询改写为关联查询,则会导致"外表记录"被关联匹配多次而产生重复记录,可以通过将关联结果集插入到"带有唯一索引的临时表"的方式来移除重复记录,保证最终查询结果的准确性。

对于查询:

SELECT * 
FROM Country 
WHERE Country.code IN (
    SELECT City.Country
    FROM City 
    WHERE City.Population > 0.33 * Country.Population 
    AND City.Population > 1*1000*1000
);

可以改写为:

CREATE tmp_Country LIKE Country;

INSERT IGNORE INTO tmp_Country
SELECT Country.* 
FROM Country
INNER JOIN City
ON Country.code = City.Country
WHERE City.Population > 0.33 * Country.Population 
AND City.Population > 1*1000*1000

SELECT * FROM tmp_Country;

如图所示:

使用Duplicate Weedout 策略时,MariaDB 查询计划为:

explain select * from Country where Country.code IN (select City.Country from City where City.Population > 0.33 * Country.Population and City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: City
         type: range
possible_keys: Population,Country
          key: Population
      key_len: 4
          ref: NULL
         rows: 238
        Extra: Using index condition; Start temporary
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: Country
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 3
          ref: world.City.Country
         rows: 1
        Extra: Using where; End temporary
2 rows in set (0.00 sec)

学习总结

First Match策略通过"找到第一条匹配记录即返回"的方式来跳过无效子查询扫描。

除First Match策略外都是子查询转换为关联查询来优化提升查询效率,按照不同查询场景采用不同策略来"避免重复记录":

  • Table Pullout策略,通过唯一索引和主键索引逻辑来确认"子查询结果集"中重复记录。
  • Materialization策略,通过临时表来移除"子查询结果集"中重复记录。
  • Loose Scan策略,通过Index Loose Scan特性来跳过"子查询结果集"中重复记录。
  • Duplicate Weedout策略,通过临时表来将移除"关联查询结果集"中重复记录。

参考资料

Semi-join Subquery Optimizations

标签:City,Join,Semi,Country,PRIMARY,查询,Subquery,select,Population
From: https://www.cnblogs.com/gaogao67/p/17378214.html

相关文章

  • ASEMI代理ADI亚德诺LTC3309AEV#TRMPBF车规级芯片
    编辑-ZLTC3309AEV#TRMPBF特点:与LTC3307(3A)和LTC3308(4A)引脚兼容高效率:8mΩNMOS、31mΩPMOS可编程频率1MHz至3MHz微型电感器和电容器峰值电流模式控制22ns最短接通时间宽带宽,快速瞬态响应SilentSwitcher®架构超低EMI辐射低纹波突发模式®以40µA的IQ......
  • ASEMI代理亚德诺LTC2954CTS8-2#TRMPBF车规级芯片
    编辑-ZLTC2954CTS8-2#TRMPBF参数描述:型号:LTC2954CTS8-2#TRMPBF电源电压范围:2.7-26.4VVIN供电电流:6-12µAVIN欠电压锁定:2.2-2.5VVIN欠压锁定滞后:50-700mVONT上拉电流:–2.4to–3.6µAONT下拉电流:2.4-3.6µA内部开启解除抖动时间:26-41ms额外的可调节开启时间:9-13.5m......
  • ASEMI代理ADUM1250ARZ-RL7原装ADI车规级ADUM1250ARZ-RL7
    编辑:llASEMI代理ADUM1250ARZ-RL7原装ADI车规级ADUM1250ARZ-RL7型号:ADUM1250ARZ-RL7品牌:ADI/亚德诺封装:SOIC-8-150mil批号:2023+引脚数量:8工作温度:-40°C~105°C安装类型:表面贴装型ADUM1250ARZ-RL7汽车芯片ADUM1250ARZ-RL7特性双向I2C通信开放式排水接口适用于热插拔......
  • ASEMI代理ADUC834BSZ原装ADI车规级ADUC834BSZ
    编辑:llASEMI代理ADUC834BSZ原装ADI车规级ADUC834BSZ型号:ADUC834BSZ品牌:ADI/亚德诺封装:MQFP-52批号:2023+安装类型:表面贴装型引脚数量:8工作温度:-40°C~125°C类型:车规级芯片ADUC834BSZ特征高分辨率-ADC2个独立ADC(16位和24位分辨率)24位无缺失代码,主ADC21位rms(18.5位p......
  • JAVA中的两个容器StringBuilder和StringJoiner概述
    JAVA中的两个容器StringBuilder和StringJoiner概述StringBuilder可以看成一个容器,创建之后里面的内容是可以修改的方法名说明publicStringBuilderappend(任意类型)添加数据,并返回对象本身publicStringBuilderreverse()反转容器中的内容publicintlength()返......
  • ASEMI代理ADM3251EARWZ-REEL原装ADI车规级ADM3251EARWZ-REEL
    编辑:llASEMI代理ADM3251EARWZ-REEL原装ADI车规级ADM3251EARWZ-REEL型号:ADM3251EARWZ-REEL品牌:ADI/亚德诺封装:SOIC-20-300mil批号:2023+引脚数量:20工作温度:-40°C~85°C安装类型:表面贴装型ADM3251EARWZ-REEL汽车芯片ADM3251EARWZ-REEL特性2.5kV完全隔离(电源和数据......
  • ASEMI代理ADM3202ARUZ-REEL7原装ADI车规级ADM3202ARUZ-REEL7
    编辑:llASEMI代理ADM3202ARUZ-REEL7原装ADI车规级ADM3202ARUZ-REEL7型号:ADM3202ARUZ-REEL7品牌:ADI/亚德诺封装:TSSOP-16批号:2023+安装类型:表面贴装型引脚数量:16工作温度:VIORM:3.3V电源类型:车规级芯片ADM3202ARUZ-REEL7特征460kbps数据速率规定电压为3.3V符合EIA-23......
  • K8s报错:[preflight] WARNING: JoinControlPane.controlPlane settings will be ignore
    一、报错信息[preflight]WARNING:JoinControlPane.controlPlanesettingswillbeignoredwhencontrol-planeflagisnotset.[preflight]Runningpre-flightcheckserrorexecutionphasepreflight:[preflight]Somefatalerrorsoccurred:[ERRORFileAvailabl......
  • 线程常用方法join 和threadLocal
     从源码中可以得知,如果想要join方法正常生效,调用join方法的线程对象必须已经调用了start()方法并且未进入终止状态。扩展:从join方法的源码来看,join方法的本质调用的是Object中的wait方法实现线程的阻塞,wait方法的实现原理在后续的文章中在说详细阐述。**但是我们需要知道的是......
  • ASEMI代理ADI亚德诺ADM3051CRZ-REEL7车规级芯片
    编辑-ZADM3051CRZ-REEL7芯片参数:型号:ADM3051CRZ-REEL7显性状态:78mA隐性状态:10mA待命状态:275μACANH输出电压:4.5VCANL输出电压:2V差动输出电压:3V输入电压滞后:150mV差分输入电阻:100kΩ参考输出电压:3.025V最大数据速率:1MbpsVCC:−0.3Vto+7V工作温度范围:−40°C......