首页 > 其他分享 >2022-10-12 myql-Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations

2022-10-12 myql-Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations

时间:2022-10-16 16:06:52浏览次数:72  
标签:10 Optimizing 12 ... subquery semijoin table class SELECT


8.2.2.1 Optimizing IN and EXISTS Subquery Predicates with Semijoin Transformations


A semijoin is a preparation-time transformation that enables multiple execution strategies such as table pullout, duplicate weedout, first match, loose scan, and materialization. The optimizer uses semijoin strategies to improve subquery execution, as described in this section.

For an inner join between two tables, the join returns a row from one table as many times as there are matches in the other table. But for some questions, the only information that matters is whether there is a match, not the number of matches. Suppose that there are tables named ​​class​​​ and ​​roster​​ that list classes in a course curriculum and class rosters (students enrolled in each class), respectively. To list the classes that actually have students enrolled, you could use this join:

SELECT class.class_num, class.class_name
FROM class
INNER JOIN roster
WHERE class.class_num = roster.class_num;

However, the result lists each class once for each enrolled student. For the question being asked, this is unnecessary duplication of information.

Assuming that ​​class_num​​​ is a primary key in the ​​class​​​ table, duplicate suppression is possible by using ​​SELECT DISTINCT​​, but it is inefficient to generate all matching rows first only to eliminate duplicates later.

The same duplicate-free result can be obtained by using a subquery:

SELECT class_num, class_name
FROM class
WHERE class_num IN
(SELECT class_num FROM roster);

Here, the optimizer can recognize that the ​​IN​​​ clause requires the subquery to return only one instance of each class number from the ​​roster​​​ table. In this case, the query can use a semijoin; that is, an operation that returns only one instance of each row in ​​class​​​ that is matched by rows in ​​roster​​.

The following statement, which contains an ​​EXISTS​​​ subquery predicate, is equivalent to the previous statement containing an ​​IN​​ subquery predicate:

SELECT class_num, class_name
FROM class
WHERE EXISTS
(SELECT * FROM roster WHERE class.class_num = roster.class_num);

In MySQL 8.0.16 and later, any statement with an ​​EXISTS​​​ subquery predicate is subject to the same semijoin transforms as a statement with an equivalent ​​IN​​ subquery predicate.

Beginning with MySQL 8.0.17, the following subqueries are transformed into antijoins:

  • ​NOT IN (SELECT ... FROM ...)​
  • ​NOT EXISTS (SELECT ... FROM ...)​​.
  • ​IN (SELECT ... FROM ...) IS NOT TRUE​
  • ​EXISTS (SELECT ... FROM ...) IS NOT TRUE​​.
  • ​IN (SELECT ... FROM ...) IS FALSE​
  • ​EXISTS (SELECT ... FROM ...) IS FALSE​​.

In short, any negation of a subquery of the form ​​IN (SELECT ... FROM ...)​​​ or ​​EXISTS (SELECT ... FROM ...)​​ is transformed into an antijoin.

An antijoin is an operation that returns only rows for which there is no match. Consider the query shown here:

SELECT class_num, class_name
FROM class
WHERE class_num NOT IN
(SELECT class_num FROM roster);

This query is rewritten internally as the antijoin ​​SELECT class_num, class_name FROM class ANTIJOIN roster ON class_num​​​, which returns one instance of each row in ​​class​​ that is not matched by any rows in ​​roster​​​. This means that, for each row in ​​class​​​, as soon as a match is found in ​​roster​​​, the row in ​​class​​ can be discarded.

Antijoin transformations cannot in most cases be applied if the expressions being compared are nullable. An exception to this rule is that ​​(... NOT IN (SELECT ...)) IS NOT FALSE​​​ and its equivalent ​​(... IN (SELECT ...)) IS NOT TRUE​​ can be transformed into antijoins.

Outer join and inner join syntax is permitted in the outer query specification, and table references may be base tables, derived tables, view references, or common table expressions.

In MySQL, a subquery must satisfy these criteria to be handled as a semijoin (or, in MySQL 8.0.17 and later, an antijoin if ​​NOT​​ modifies the subquery):

  • It must be part of an​​IN​​, ​​= ANY​​, or ​​EXISTS​​ predicate that appears at the top level of the ​​WHERE​​ or ​​ON​​ clause, possibly as a term in an ​​AND​​ expression. For example:
SELECT ...
FROM ot1, ...
WHERE (oe1, ...) IN
(SELECT ie1, ... FROM it1, ... WHERE ...);

Here, ​​ot_i​​ and ​​it_i​​ represent tables in the outer and inner parts of the query, and ​​oe_i​​ and ​​ie_i​ represent expressions that refer to columns in the outer and inner tables.

In MySQL 8.0.17 and later, the subquery can also be the argument to an expression modified by ​​NOT​​​, ​​IS [NOT] TRUE​​​, or ​​IS [NOT] FALSE​​.

  • It must be a single​​SELECT​​​ without​​UNION​​ constructs.
  • It must not contain a​​HAVING​​ clause.
  • It must not contain any aggregate functions (whether it is explicitly or implicitly grouped).
  • It must not have a​​LIMIT​​ clause.
  • The statement must not use the​​STRAIGHT_JOIN​​ join type in the outer query.
  • The​​STRAIGHT_JOIN​​ modifier must not be present.
  • The number of outer and inner tables together must be less than the maximum number of tables permitted in a join.
  • The subquery may be correlated or uncorrelated. In MySQL 8.0.16 and later, decorrelation looks at trivially correlated predicates in the​​WHERE​​​ clause of a subquery used as the argument to​​EXISTS​​​, and makes it possible to optimize it as if it was used within​​IN (SELECT b FROM ...)​​. The termtrivially correlatedmeans that the predicate is an equality predicate, that it is the sole predicate in the​​WHERE​​​ clause (or is combined with​​AND​​), and that one operand is from a table referenced in the subquery and the other operand is from the outer query block.
  • The​​DISTINCT​​ keyword is permitted but ignored. Semijoin strategies automatically handle duplicate removal.
  • A​​GROUP BY​​ clause is permitted but ignored, unless the subquery also contains one or more aggregate functions.
  • An​​ORDER BY​​ clause is permitted but ignored, since ordering is irrelevant to the eval(or, in MySQL 8.0.17 or later, an antijoin if applicable) and makes a cost-based choice from these strategies:
    • Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. Table pullout pulls a table out from the subquery to the outer query.
    • Duplicate Weedout: Run the semijoin as if it was a join and remove duplicate records using a temporary table.
    • FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.
    • LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.
    • Materialize the subquery into an indexed temporary table that is used to perform a join, where the index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned. For more information about materialization, see​​Section 8.2.2.2, “Optimizing Subqueries with Materialization”​​.

    Each of these strategies can be enabled or disabled using the following ​​optimizer_switch​​ system variable flags:

    These flags are enabled by default. See ​​Section 8.9.2, “Switchable Optimizations”​​.

    The optimizer minimizes differences in handling of views and derived tables. This affects queries that use the ​​STRAIGHT_JOIN​​​ modifier and a view with an ​​IN​​ subquery that can be converted to a semijoin. The following query illustrates this because the change in processing causes a change in transformation, and thus a different execution strategy:

    CREATE VIEW v AS
    SELECT *
    FROM t1
    WHERE a IN (SELECT b
    FROM t2);

    SELECT STRAIGHT_JOIN *
    FROM t3 JOIN v ON t3.x = v.a;

    The optimizer first looks at the view and converts the ​​IN​​​ subquery to a semijoin, then checks whether it is possible to merge the view into the outer query. Because the ​​STRAIGHT_JOIN​​ modifier in the outer query prevents semijoin, the optimizer refuses the merge, causing derived table eval(See​​Section 8.8.3, “Extended EXPLAIN Output Format”​​​.) From this you can get an idea about which tables were pulled out of the semijoin. If a subquery was converted to a semijoin, you should see that the subquery predicate is gone and its tables and​​WHERE​​​ clause were merged into the outer query join list and​​WHERE​​ clause.

  • Temporary table use for Duplicate Weedout is indicated by​​Start temporary​​​ and​​End temporary​​​ in the​​Extra​​​ column. Tables that were not pulled out and are in the range of​​EXPLAIN​​​ output rows covered by​​Start temporary​​​ and​​End temporary​​​ have their​​rowid​​ in the temporary table.
  • ​FirstMatch(tbl_name)​​​ in the​​Extra​​ column indicates join shortcutting.
  • ​LooseScan(m..n)​​​ in the​​Extra​​​ column indicates use of the LooseScan strategy. ​m​​ and ​n
  • Temporary table use for materialization is indicated by rows with a​​select_type​​​ value of​​MATERIALIZED​​​ and rows with a​​table​​​ value of​​<subqueryN>​​.

In MySQL 8.0.21 and later, a semijoin transformation can also be applied to a single-table ​​UPDATE​​​ or ​​DELETE​​​ statement that uses a ​​[NOT] IN​​​ or ​​[NOT] EXISTS​​​ subquery predicate, provided that the statement does not use ​​ORDER BY​​​ or ​​LIMIT​​​, and that semijoin transformations are allowed by an optimizer hint or by the ​​optimizer_switch​​ setting.

标签:10,Optimizing,12,...,subquery,semijoin,table,class,SELECT
From: https://blog.51cto.com/adofsauron/5760338

相关文章

  • 10个常见的前端手写功能,你全都会吗?
    万丈高楼平地起,地基打的牢,才能永远立于不败之地。今天给大家带来的是10个常见的JavaScript手写功能,重要的地方已添加注释。有的是借鉴别人的,有的是自己写的,如有不正确的地......
  • 【100个 Unity实用技能】| Unity中设置 允许的最大帧数,锁定游戏的最大帧率(游戏锁帧)
    Unity小科普老规矩,先介绍一下Unity的科普小知识:Unity是实时3D互动内容创作和运营平台。包括游戏开发、美术、建筑、汽车设计、影视在内的所有创作者,借助Unity将创意变......
  • 10.12~13测验反思
    整体情况:不好。当时是下午最后一节课,可能是太困了还是不知道什么原因,没什么兴致做题。开题,前五个填空很快就搞定了。然后第一个问题就冒出来了:花了太长时间在第六题上。......
  • P2482 [SDOI2010] 猪国杀
    P2482[SDOI2010]猪国杀AC于2022年8月9日,代码写了一个晚上点击查看代码#include<stdio.h>#include<string.h>constintN=15,M=2005;structPig{ charty......
  • 123按位取反是多少?原码、反码、补码及其运算
    如题,在整数运算中总是不清楚某个数的取反和反码到底有什么区别,遂写下此博客,有参考的地方在文末中会贴出出处。在阅读本文章之后会对你了解计算机中一些基础有所帮助,文章包......
  • Exam Results Gym - 102769E
    https://vjudge.net/problem/Gym-102769E#include<cstdio>#include<iostream>#include<cstring>#include<algorithm>usingnamespacestd;typedeflonglongll;......
  • SQL1001N is not a valid database name. SQLSTATE=2E000
    $db2createdbhcba_appstoreusingcodesetUTF-8territoryCNCOLLATEUSINGSYSTEM出现错误SQL1001N"hcba_appstore"isnotavaliddatabasename.SQLSTATE=2E0......
  • 2022.10.16小记
    今天心情好差,不想计算倒计时了,感觉又开始迷茫了今天小姐妹结婚,可惜我不能去参加婚礼关于一件浪漫的事,分扣太多就不要了吧,到此为止明天要学仰泳了周日了,抓紧时间弥......
  • CF1712D
    首先有一个比较明显的性质,直径最多经过两条边。设全局最小值为\(m\)。考虑从任意一点出发,经过全局最小值所在的点,然后再到任意点,花费是\(2\timesm\)的。而任意边权\(......
  • 10.16
       第一次报错   报错语句   第二次报错   执行语句   第三次报错   执行语句   并且除了第一次报错外,每次运行都成功生......