首页 > 数据库 >KingabseES优化SQL_关联倾斜数据

KingabseES优化SQL_关联倾斜数据

时间:2023-02-28 17:58:17浏览次数:37  
标签:KingabseES rows .. tp01 关联 cost istype SQL loops

KingabseES优化SQL_关联倾斜数据

背景

对于两个大表关联的场景,如果过滤条件的列值,存在高度倾斜,可以考虑根据反向滤值,进行过滤操作,减少连接的CPU时间。

数据准备

-- 状态表 tp01_state 记录 大表tp01 记录的多种状态 

kingbase=# select count(*) from tp01;
 count
----------
10000000
(1 行记录)

--只有一个高度倾斜的列值
kingbase=# select issuc,count(*) from tp01_state group by issuc order by issuc;

issuc |  count
-------+---------
N     |     100
Y     | 9999900
(2 行记录)

--有多个高度倾斜的列值
kingbase=# select istype, count(*) from tp01_state group by istype order by istype;
istype |  count
--------+---------
A      |     100
C      | 8999700
G      |     100
M      | 1000000
W      |     100
(5 行记录)


查询issuc='Y'数据

标准语句

多数数据匹配issuc='Y'条件,执行计划就是两个大表,进行hashjoin。

select *
from tp01
where id in (select id from tp01_state where issuc = 'Y');
--或者
select *
from tp01
where exists(select 1 from tp01_state where id = tp01.id and issuc = 'Y');

-- QUERY PLAN
Hash Semi Join  (cost=338555.00..1033383.15 rows=10000000 width=241) (actual time=2398.867..5889.537 rows=9999900 loops=1)
  Hash Cond: (tp01.id = tp01_state.id)
  ->  Seq Scan on tp01  (cost=0.00..444828.12 rows=10000012 width=241) (actual time=0.005..611.596 rows=10000000 loops=1)
  ->  Hash  (cost=213555.00..213555.00 rows=10000000 width=4) (actual time=2384.857..2384.858 rows=9999900 loops=1)
        Buckets: 16777216  Batches: 1  Memory Usage: 482631kB
        ->  Seq Scan on tp01_state  (cost=0.00..213555.00 rows=10000000 width=4) (actual time=0.011..775.853 rows=9999900 loops=1)
              Filter: (issuc = 'Y'::text)
              Rows Removed by Filter: 100
Planning Time: 0.186 ms
Execution Time: 6137.233 ms


优化1:使用NOT IN 代替 IN

因为只有少量数据,匹配issuc='Y'反向条件,使用not in 减少大表的过滤操作。

select *
from tp01
where tp01.id not in (select id from tp01_state where issuc <> 'Y' or issuc is null);

-- QUERY PLAN
Seq Scan on tp01  (cost=213555.00..683383.15 rows=5000006 width=241) (actual time=517.554..1629.795 rows=9999900 loops=1)
  Filter: (NOT (hashed SubPlan 1))
  Rows Removed by Filter: 100
  SubPlan 1
    ->  Seq Scan on tp01_state  (cost=0.00..213555.00 rows=1 width=4) (actual time=271.143..517.503 rows=100 loops=1)
          Filter: (issuc <> 'Y'::text)
          Rows Removed by Filter: 9999900
Planning Time: 0.087 ms
Execution Time: 1870.376 ms

优化2:使用not between 代替 <>

not between 操作可以使用索引,就可以减少子查询的执行时间。

select *
from tp01
where tp01.id not in (select id from tp01_state where issuc not between 'Y' and 'Y' or issuc is null);

-- QUERY PLAN
Seq Scan on tp01  (cost=17.35..469845.50 rows=5000006 width=241) (actual time=0.098..1109.085 rows=9999900 loops=1)
  Filter: (NOT (hashed SubPlan 1))
  Rows Removed by Filter: 100
  SubPlan 1
    ->  Bitmap Heap Scan on tp01_state  (cost=13.33..17.34 rows=1 width=4) (actual time=0.035..0.045 rows=100 loops=1)
          Recheck Cond: ((issuc < 'Y'::text) OR (issuc > 'Y'::text) OR (issuc IS NULL))
          Heap Blocks: exact=2
          ->  BitmapOr  (cost=13.33..13.33 rows=1 width=0) (actual time=0.028..0.030 rows=0 loops=1)
                ->  Bitmap Index Scan on tp01_state_issuc  (cost=0.00..4.44 rows=1 width=0) (actual time=0.020..0.020 rows=100 loops=1)
                      Index Cond: (issuc < 'Y'::text)
                ->  Bitmap Index Scan on tp01_state_issuc  (cost=0.00..4.44 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1)
                      Index Cond: (issuc > 'Y'::text)
                ->  Bitmap Index Scan on tp01_state_issuc  (cost=0.00..4.44 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1)
                      Index Cond: (issuc IS NULL)
Planning Time: 0.109 ms
Execution Time: 1349.526 ms

查询istype in ('C','M')数据

标准语句

多数数据匹配istype in ('C','M')条件,执行计划就是两个大表,进行hashjoin。

explain analyze
select *
from tp01
where id in (select id from tp01_state where istype in ('C', 'M'));

-- QUERY PLAN
Hash Semi Join  (cost=307305.11..927445.94 rows=7500009 width=241) (actual time=2848.058..6398.654 rows=9999700 loops=1)
  Hash Cond: (tp01.id = tp01_state.id)
  ->  Seq Scan on tp01  (cost=0.00..444828.12 rows=10000012 width=241) (actual time=0.005..613.502 rows=10000000 loops=1)
  ->  Hash  (cost=213555.00..213555.00 rows=7500009 width=4) (actual time=2840.972..2840.972 rows=9999700 loops=1)
        Buckets: 16777216 (originally 8388608)  Batches: 1 (originally 1)  Memory Usage: 482624kB
        ->  Seq Scan on tp01_state  (cost=0.00..213555.00 rows=7500009 width=4) (actual time=0.034..1032.910 rows=9999700 loops=1)
              Filter: (istype = ANY ('{C,M}'::text[]))
              Rows Removed by Filter: 300
Planning Time: 0.193 ms
Execution Time: 6646.452 ms

优化1:使用NOT IN 代替 IN

因为只有少量数据,匹配istype in ('C','M')反向条件,使用not in 减少大表的过滤操作。

select *
from tp01
where id not in (select id from tp01_state where istype not in ('C', 'M') or istype is null );

-- QUERY PLAN
Seq Scan on tp01  (cost=175497.98..645326.13 rows=5000006 width=241) (actual time=778.116..2699.271 rows=9999700 loops=1)
  Filter: (NOT (hashed SubPlan 1))
  Rows Removed by Filter: 300
  SubPlan 1
    ->  Seq Scan on tp01_state  (cost=0.00..169248.00 rows=2499991 width=4) (actual time=0.006..767.589 rows=300 loops=1)
          Filter: ((istype <> ALL ('{C,M}'::text[])) OR (istype IS NULL))
          Rows Removed by Filter: 9999700
Planning Time: 0.101 ms
Execution Time: 2934.265 ms

优化2:使用not between 代替 <>

not between 操作根据选择率最佳的列值,使用索引,就可以减少子查询的执行时间。

select *
from tp01
where id not in (select id from tp01_state
where (istype not between 'C' and 'C' and istype not between 'M' and 'M') or istype is null);


-- QUERY PLAN
Seq Scan on tp01  (cost=106721.48..576549.63 rows=5000006 width=241) (actual time=223.295..1862.006 rows=9999700 loops=1)
  Filter: (NOT (hashed SubPlan 1))
  Rows Removed by Filter: 300
  SubPlan 1
    ->  Bitmap Heap Scan on tp01_state  (cost=22927.80..104507.84 rows=885454 width=4) (actual time=58.615..220.275 rows=300 loops=1)
          Recheck Cond: (((istype < 'C'::text) OR (istype > 'C'::text)) OR (istype IS NULL))
          Filter: ((((istype < 'C'::text) OR (istype > 'C'::text)) AND ((istype < 'M'::text) OR (istype > 'M'::text))) OR (istype IS NULL))
          Rows Removed by Filter: 1000000
          Heap Blocks: exact=4428
          ->  BitmapOr  (cost=22927.80..22927.80 rows=981652 width=0) (actual time=58.266..58.268 rows=0 loops=1)
                ->  BitmapOr  (cost=22701.99..22701.99 rows=981652 width=0) (actual time=58.262..58.263 rows=0 loops=1)
                      ->  Bitmap Index Scan on tp01_state_istype  (cost=0.00..5.69 rows=167 width=0) (actual time=0.026..0.027 rows=100 loops=1)
                            Index Cond: (istype < 'C'::text)
                      ->  Bitmap Index Scan on tp01_state_istype  (cost=0.00..22253.58 rows=981486 width=0) (actual time=58.235..58.235 rows=1000200 loops=1)
                            Index Cond: (istype > 'C'::text)
                ->  Bitmap Index Scan on tp01_state_istype  (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1)
                      Index Cond: (istype IS NULL)
Planning Time: 0.350 ms
Execution Time: 2099.544 ms

优化3:使用<和>的范围条件组合,代替not between

将多个not between条件,分解成范围条件组合,充分利用索引,减少filter操作。

select *
from tp01
where id not in (
    select id
    from tp01_state
    where (istype < 'C')
       or (istype > 'C' and istype < 'M')
       or (istype > 'M')
       or istype is null);

-- QUERY PLAN
Seq Scan on tp01  (cost=350.11..470178.26 rows=5000006 width=241) (actual time=0.142..1099.829 rows=9999700 loops=1)
  Filter: (NOT (hashed SubPlan 1))
  Rows Removed by Filter: 300
  SubPlan 1
    ->  Bitmap Heap Scan on tp01_state  (cost=8.60..349.28 rows=334 width=4) (actual time=0.067..0.091 rows=300 loops=1)
          Recheck Cond: ((istype < 'C'::text) OR ((istype > 'C'::text) AND (istype < 'M'::text)) OR (istype > 'M'::text) OR (istype IS NULL))
          Heap Blocks: exact=2
          ->  BitmapOr  (cost=8.60..8.60 rows=334 width=0) (actual time=0.058..0.060 rows=0 loops=1)
                ->  Bitmap Index Scan on tp01_state_istype  (cost=0.00..2.69 rows=167 width=0) (actual time=0.019..0.019 rows=100 loops=1)
                      Index Cond: (istype < 'C'::text)
                ->  Bitmap Index Scan on tp01_state_istype  (cost=0.00..1.45 rows=1 width=0) (actual time=0.024..0.024 rows=100 loops=1)
                      Index Cond: ((istype > 'C'::text) AND (istype < 'M'::text))
                ->  Bitmap Index Scan on tp01_state_istype  (cost=0.00..2.69 rows=167 width=0) (actual time=0.014..0.014 rows=100 loops=1)
                      Index Cond: (istype > 'M'::text)
                ->  Bitmap Index Scan on tp01_state_istype  (cost=0.00..1.44 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1)
                      Index Cond: (istype IS NULL)
Planning Time: 0.183 ms
Execution Time: 1340.184 ms


总结

查询优化的宗旨,是更少的数据量和更少计算量,不要摒弃 not in这样不易优化的操作符。

标签:KingabseES,rows,..,tp01,关联,cost,istype,SQL,loops
From: https://www.cnblogs.com/kingbase/p/17051305.html

相关文章

  • MySql创建存储过程
    #1.插入数据前先把表中的索引去掉,数据插入完成之后,再创建索引#2.关闭事务的自动提交DROPPROCEDUREifEXISTSBatchInsert;#分隔符delimiter$$CREATEPROCEDURE......
  • win11环境 cmd 命令窗口 sqlplus 命令无响应
    此问题疑似path环境变量过长导致,安装过程中已有类似提示 之前我是删除了部分环境变量后通过校验。安装完成后把path删除的环境变量再加上去隔天重启服务器后发现 ......
  • mysql-索引
    一、索引回表:B+树索引主要可以分为两种索引,聚集索引和非聚集索引。聚集索引:也就是平常我们说的主键索引,在B+树中叶子节点存的是整行数据。非聚集索引:也叫二级索引,也......
  • C#文件转Byte存储到SQLServer数据库
    场景:临时不同的系统,相同的数据库,通用文件读取1.数据库字段采用:varbinary(max)类型2.代码注:这里获取文件是通过先将byte[]数据先存到当前服务器对应项目的文件夹中(加了......
  • sql 开窗函数排序遇到空值的处理办法
    sqlsever默认null最小升序排序null值在最前面,若要放在后面,则:orderbycasewhencolisnullthen1else0end,col降序排序null值在最后面,若要放在前面,则:order......
  • sql lag函数
    laghttps://spark.apache.org/docs/latest/api/sql/#laglag(input[,offset[,default]])OVER(PARTITIONBYaORDERBYb)input-求lag的目标字段offset-lag步......
  • PostgreSQL备份脚本
    1.直接上代码#!/bin/bash#数据库名称db_name="odoo14e"#存放备份文件的url地址base_url="/home/odoo/pgsql_backup/backup_files/"#判断是否存在存放备份文......
  • 解决SQL Server 删除大量数据后磁盘空间无变化
    造成空间不变化的原因InnoDB中采用了B+树作为存储数据的结构,也就是常说的索引组织表。在InnoDB中,delete操作并不会真的删除数据,SQLServer实际上只是给要删除的数据打......
  • Student(含有sql操作封装方法,登录验证)(重要)(常敲敲)
    BaseDao数据库操作工具,直接可以拿来用。记得要继承。那个StudentUserInfo里面没信息。因为StudentInfo包含学号和密码就直接用了packagedao;importjava.sql.*;/**数据库......
  • MySQL 内连接、外连接、全连接
    连接查询用于多表关联查询,连接方式不同,查询重点不同。内连接作用:查询多表之间交集部分数据关键词:innerjoin可简写为:join,即innerjoin=join图示:左外......