首页 > 其他分享 >ClickHouse测试之mergetree中的order by字段是否符合最左原则

ClickHouse测试之mergetree中的order by字段是否符合最左原则

时间:2022-10-18 16:23:59浏览次数:58  
标签:Node String pid order test Type mergetree ClickHouse Plans

先简单说一下最左原则

顾名思义:
1、最左优先,以最左边的为起点任何连续的索引都能匹配上。同时遇到范围查询(>,<,between,like)就会停止匹配。

2、例如:b=2如果建立(a,b)顺序的索引,是匹配不到(a,b)索引的;
但如果查询条件是a=1 and b=2或者a=1(又或者是b=2 and b=1)就可以,因为优化器会自动调整a,b的顺序。

3、再比如a=1 and b=2 and c>3 and d=4 如果建立(a,b,c,d)顺序的索引,d是用不到索引的,因为c字段是一个范围查询,它之后的字段就会停止匹配。

 

测试案例如下:

表名

总行数

原始大小

压缩大小

压缩率

test_event_combine

ORDER BY (pid, logtime)

 

355820549

91.19 GiB

6.83 GiB

7

test_logtime_index

ORDER BY (logtime, pid)

355820549

 91.19 GiB

 6.85 GiB

7

建表语句如下(测试数据约3.5亿条):


 

CREATE TABLE test.test_event_combine
(
`pid` Int32,
`role_uid` Int64,
`group_base` Int32,
`plosgn` Int8,
`pl` String,
`plname` String,
`server_id` Int32,
`os` String,
`uuid` String,
`gn` String,
`state` String,
`ip` String,
`logtime` Int64,
`log_type` String,
`event` String,
`kv` String
)
ENGINE = MergeTree
ORDER BY (pid, logtime)
SETTINGS index_granularity = 8192

 

 

CREATE TABLE test. test_logtime_index
(
`pid` Int32,
`role_uid` Int64,
`group_base` Int32,
`plosgn` Int8,
`pl` String,
`plname` String,
`server_id` Int32,
`os` String,
`uuid` String,
`gn` String,
`state` String,
`ip` String,
`logtime` Int64,
`log_type` String,
`event` String,
`kv` String
)
ENGINE = MergeTree
ORDER BY (logtime,pid)
SETTINGS index_granularity = 8192

查询效率对比:


SELECT count(pid) AS num
FROM test.test_event_combine
WHERE pid in(
6609291
,182776
,7016351
,336626
,121096
,522472
,6997365
,197394
,4979687
,6898037
,2543060
,7011696
,6687321
) ;

SELECT count(pid) AS num
FROM test.test_logtime_index
WHERE pid in(
6609291
,182776
,7016351
,336626
,121096
,522472
,6997365
,197394
,4979687
,6898037
,2543060
,7011696
,6687321
) ;

1 rows in set. Elapsed: 0.014 sec. Processed 425.98 thousand rows, 1.70 MB (30.02 million rows/s., 120.10 MB/s.) 1 rows in set. Elapsed: 0.989 sec. Processed 355.82 million rows, 1.42 GB (359.72 million rows/s., 1.44 GB/s.)

结论:当order by pid字段在首位的时候,比在第二个字段快了近100倍左右。

 执行计划如下:

执行计划 (order by pid,logtime)


"Plan": {
"Node Type": "Expression",
"Description": "(Projection + Before ORDER BY)",
"Plans": [
{
"Node Type": "Aggregating",
"Plans": [
{
"Node Type": "Expression",
"Description": "Before GROUP BY",
"Plans": [
{
"Node Type": "Filter",
"Description": "WHERE",
"Plans": [
{
"Node Type": "SettingQuotaAndLimits",
"Description": "Set limits and quota after reading from storage",
"Plans": [
{
"Node Type": "ReadFromMergeTree",
"Indexes": [
{
"Type": "PrimaryKey",
"Keys": ["pid"],
"Condition": "(pid in 13-element set)",
"Initial Parts": 6,
"Selected Parts": 6,
"Initial Granules": 43525,
"Selected Granules": 52

执行计划 (order by logtime,pid)

"Plan": {
"Node Type": "Expression",
"Description": "(Projection + Before ORDER BY)",
"Plans": [
{
"Node Type": "Aggregating",
"Plans": [
{
"Node Type": "Expression",
"Description": "Before GROUP BY",
"Plans": [
{
"Node Type": "Filter",
"Description": "WHERE",
"Plans": [
{
"Node Type": "SettingQuotaAndLimits",
"Description": "Set limits and quota after reading from storage",
"Plans": [
{
"Node Type": "ReadFromMergeTree",
"Indexes": [
{
"Type": "PrimaryKey",
"Keys": ["pid"],
"Condition": "(pid in 13-element set)",
"Initial Parts": 5,
"Selected Parts": 5,
"Initial Granules": 43504,
"Selected Granules": 43504

结论:观察执行计划发现都包含索引"Keys": ["pid"],但是Selected Granules两者差异比较大;看计划即使当字段在第二位走索引了,但是还是扫描的全表数据,可以理解为索引失效。

进行投影优化:

优化追加投影

ALTER TABLE test.test_logtime_index ADD PROJECTION norm_projection8
(SELECT * ORDER BY pid);

ALTER TABLE test.test_logtime_index MATERIALIZE PROJECTION norm_projection8;

1 rows in set. Elapsed: 0.057 sec. Processed 483.26 thousand rows, 1.93 MB (8.41 million rows/s., 33.63 MB/s.)

  观察执行计划:

执行计划:

    "Plan": {
      "Node Type": "Expression",
      "Description": "(Projection + Before ORDER BY)",
      "Plans": [
        {
          "Node Type": "Aggregating",
          "Plans": [
            {
              "Node Type": "Expression",
              "Description": "Before GROUP BY",
              "Plans": [
                {
                  "Node Type": "SettingQuotaAndLimits",
                  "Description": "Set limits and quota after reading from storage",
                  "Plans": [
                    {
                      "Node Type": "ReadFromStorage",
                      "Description": "MergeTree(with Normal projection norm_projection8)"

  投影后表变化:

结论:观察执行计划,发现本次执行新增了投影"Description": "MergeTree(with Normal projection norm_projection8)"计划。但是表大小增加了一倍左右

标签:Node,String,pid,order,test,Type,mergetree,ClickHouse,Plans
From: https://www.cnblogs.com/-courage/p/16802970.html

相关文章

  • clickhouse时间日期函数
    一、常用的时间戳到日期,日期到时间戳操作SELECTtoDateTime(1665936000,'Asia/Shanghai')`时间戳转为:%Y-%m-%d%H:%M:%S`,toUnixTimestamp('2022-10-17')AS`......
  • python+clickhouse
    1.关于使用客户端能链接成功而使用代码无法访问clickhouse的问题clikchouse有两种链接方式:http端口默认为8123,tcp默认接口为9000使用客户端链接时用8123端口,而代码中就......
  • web前端-css边框(border)
    ......
  • 论文笔记 - Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-
    prompt的影响因素MotivationPrompt中Example的排列顺序对模型性能有较大影响(即使已经校准参见好的情况下,选取不同的排列顺序依然会有很大的方差):校准可以大幅度提......
  • WHY CHINA’S CROSS-BORDER E-CO
    ​​https://fulfillmentbridge.com/cross-border-ecommerce-trade-war/​​WHYCHINA’SCROSS-BORDERE-COMMERCEMARKETISSURGING?At$5.8trillionlastyear,C......
  • C++ hash in #include <unordered_map>
    #pragmaonce#pragmacomment(lib,"rpcrt4.lib")#include<Windows.h>#include<rpcdce.h>#include<iostream>#include<thread>#include<unordered_map>using......
  • DEMO: BAPI_SALESORDER_CREATEFROMDAT2 创建销售订单
    货铺QQ群号:834508274进群统一修改群名片,例如BJ_ABAP_森林木。群内禁止发广告及其他一切无关链接,小程序等,进群看公告,谢谢配合不修改昵称会被不定期踢除,谢谢配合下面开始干货:R......
  • css3中的圆角border-radius
    css3的属性应该加上浏览器前缀不同的浏览器内核不同,因此css的前缀不同。常见的几种浏览器内核火狐浏览器Geoko内核前缀是-mox-谷歌浏览器,Webkit内核......
  • Pytest插件pytest-order指定用例顺序
    Pytest插件pytest-order指定用例顺序安装 pipinstallpytest-order注意不是pytest-ordering说起来这里有个故事 关于pytest-ordering和pytest-orderhttps......
  • DEMO: BAPI_SALESORDER_CREATEFROMDAT2 创建订单
    REPORTzdemo_va01.PARAMETERSp_kunnrTYPEkunnrDEFAULT'1004615'.PARAMETERSp_vkorgTYPEvkorgDEFAULT'S600'.PARAMETERSp_vtwegTYPEvtwegDEFAULT'10'.PARAM......