首页 > 数据库 >ClickHouse SQL调优及执行计划分析

ClickHouse SQL调优及执行计划分析

时间:2023-02-03 14:25:55浏览次数:51  
标签:bid SQL cluster select aid dw 调优及 local ClickHouse

ClickHouse在做SQL查询时要尽量遵循的原则

1.大表在左,小表在右,否则会造成右表加载数据量太大,大量消耗内存资源;

2.如果join的右表为大表,则需要将右表写成子查询,在子查询中将右表的条件都加上,并进行列裁剪,这样可以有效减少数据加载;

3.where条件中只放左表的条件,如果放右表的条件将在下推阶段右表条件不会生效,将右表条件放到join的子查询中去。

select ...
from t_all
join ( -- 右表本身直接走本地表
  select ...
  from t_local
  where t_local.filter = xxx -- 尽可能手动将条件放在子查询中
)
where
  t_local.f = xxx -- 当前版本不支持自动下推到JOIN查询中,需要手动修改
  and t_all.f in (
    select ... from xxx -- 若能将子查询作为筛选条件更佳
  )

ClickHouse执行计划分析

此执行计划分析是在多分片单副本的ClickHouse环境中执行的。

准备数据

-- 建表语句
CREATE TABLE dw_local.t_a on cluster cluster_name (
`aid` Int64,
`score` Int64,
`shard` String,
`_sign` Int8,
`_version` UInt64
) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/table/{shard}/t_a', '{replica}')
ORDER BY (aid);
CREATE TABLE dw_dist.t_a on cluster cluster_name as dw_local.t_a ENGINE = Distributed('cluster_name', 'dw_local', 't_a', sipHash64(shard));
CREATE VIEW dw.t_a on cluster cluster_name as select * from dw_dist.t_a final where _sign = 1;

CREATE TABLE dw_local.t_b on cluster cluster_name (
`bid` Int64,
`aid` Int64,
`shard` String,
`_sign` Int8,
`_version` UInt64
) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/table/{shard}/t_b', '{replica}')
ORDER BY (bid);
CREATE TABLE dw_dist.t_b on cluster cluster_name as dw_local.t_b ENGINE = Distributed('cluster_name', 'dw_local', 't_b', sipHash64(shard));
CREATE VIEW dw.t_b on cluster cluster_name as select * from dw_dist.t_b final where _sign = 1;


-- 插入数据
insert into dw_dist.t_a (aid, score, shard, _sign, _version) values(1, 1, 's1', 1, 1);
insert into dw_dist.t_a (aid, score, shard, _sign, _version) values(2, 1, 's2', 1, 2);
insert into dw_dist.t_b (bid, aid, shard, _sign, _version) values(1, 1, 's1', 1, 1);
insert into dw_dist.t_b (bid, aid, shard, _sign, _version) values(2, 2, 's2', 1, 2);
insert into dw_dist.t_b (bid, aid, shard, _sign, _version) values(3, 0, 's1', 1, 3);
insert into dw_dist.t_b (bid, aid, shard, _sign, _version) values(4, 1, 's2', 1, 4);

案例1

explain select a.aid as aid, b.bid as bid, count(*) as ct from dw.t_a as a join dw.t_b as b on a.aid = b.aid group by aid, bid;

通过执行计划可以看到,左表和右表的执行计划基本一样,先将远程节点上的表数据拉取到本地,然后和本地表的数据进行union操作,然后左表和右表进行join操作,最后进行group by操作。

案例2

explain select a.aid as aid, b.bid as bid, count(*) as ct from dw.t_b as b join dw.t_a as a on a.aid = b.aid group by aid, bid;

此SQL和案例1SQL相比只是将左右表的顺序调换,此次表的执行顺序只是调换了一下,从左到右执行。

案例3

explain select a.aid as aid, b.bid as bid, count(*) as ct from dw.t_a as a join dw.t_b as b on a.aid = b.aid where a.aid=1 group by aid, bid;

此SQL和案例1SQL相比添加了左表的where条件,执行计划中将条件放在了左表进行数据加载。

案例4

explain select a.aid as aid, b.bid as bid, count(*) as ct from dw.t_b as b join dw.t_a as a on a.aid = b.aid where a.aid=1 group by aid, bid;

此SQL中的左表表无过滤条件,右表存在过滤条件,执行计划并不会将右表的条件放到拉取数据的阶段。

案例5

explain select a.aid as aid, b.bid as bid, count(*) as ct from dw_local.t_a as a join dw_local.t_b as b on a.aid = b.aid where a.aid=1 group by aid, bid;

如果所有表都采用本地表,而不是分布式表,则在数据加载阶段不会拉取远程节点的数据。

案例6

explain select a.aid as aid, b.bid as bid, count(*) as ct from dw_local.t_b as b join dw_local.t_a as a on a.aid = b.aid where a.aid=1 group by aid, bid;

调换左右表顺序,同案例5结果相同。

案例7

explain select a.aid as aid, b.bid as bid, count(*) as ct from (
    select * from dw.t_a
) as a join dw.t_b as b on a.aid = b.aid group by aid, bid;

左表为子查询时,左表也会走分布式表查询方式拉取数据,有文章说左表为子查询的情况下会导致左表走本地查询策略,不走分布式查询策略,可能是版本原因,我们测试过程中没有出现类似情况。

 

标签:bid,SQL,cluster,select,aid,dw,调优及,local,ClickHouse
From: https://www.cnblogs.com/chuijingjing/p/17087160.html

相关文章

  • SQL Server 索引碎片维护
    一、产生原因及影响索引是数据库引擎中针对表(有时候也针对视图)建立的特别数据结构,用来帮助查找和整理数据,它的重要性体现在能够使数据库引擎快速返回查询结果。当对索引......
  • MySQL数据库的安装(安装以及简单数据库操作)
    有关mysql数据库 MySQL是目前比较流行的关系型数据库管理系统之一,其体积小、速度快、总体拥有成本低,尤其是开放源码这一特点,深受中小型网站用户的欢迎。安装方式一​​点击......
  • KingbaseES PLSQL 支持语句级回滚
    KingbaseES默认如果在PLSQL-block执行过程中的任何SQL语句导致错误,都会导致该事务的所有语句都被回滚,而Oracle则是语句级的回滚。KingbaseES为了更好的与Oracle兼容,新......
  • 09-SQL语句的生成-概览
    orm与SQLorm的最重要的作用是帮助我们从繁重的sql书写中解放出来,依据灵活的查询的条件动态生成sql。毕竟数据库最终查询数据也是只认sql。所以搞明白sql的生成尤......
  • mysql必知必会——GROUP BY和HAVING
    GROUPBY语法可以根据给定数据列的每个成员对查询结果进行分组统计,最终得到一个分组汇总表。select子句中的列名必须为分组列或列函数,列函数对于groupby子句定义的每个组......
  • Ubantu 安装 MySQL 8.0 教程
    第一步:更新软件包$sudoaptupdate第二步:安装mysql8.0$sudoaptinstallmysql-server-8.0-y第三步:查询安装的mysql版本$mysql--versionmysqlVer8.0......
  • MySQL创建用户和授权
    MySQL创建用户和授权--语法createuser'用户名'@'主机'identifiedby'密码';--实例createuser'rayfoo'@'%'identifiedby'123456';--刷新,立即生效flushpr......
  • SQL Server新建维护计划
    1.连接数据库,点击维护计划->新建维护计划2.选择最左边的备份数据库、清除历史记录、清除维护。选中以后直接往右拖到设计器图面。3.效果4.选中备份数据库右击->编辑......
  • SqlServer2008R2锁机制
    1.性能低下的update会怎么样?(1).使用原始的person表,插入6条数据,由于是4000字节,所以两条数据就是一个数据页droptabledbo.person;createtableperson(idintidentity,nam......
  • SQL语句的执行顺序怎么理解?
    书写顺序SELECT->DISTINCT->FROM->JOIN->ON->WHERE->GROUPBY->HAVING->ORDERBY->LIMIT执行顺序FROM->JOIN->ON->WHERE->GROUPBY->H......