先简单说一下最左原则
顾名思义:
1、最左优先,以最左边的为起点任何连续的索引都能匹配上。同时遇到范围查询(>,<,between,like)就会停止匹配。
2、例如:b=2如果建立(a,b)顺序的索引,是匹配不到(a,b)索引的;
但如果查询条件是a=1 and b=2或者a=1(又或者是b=2 and b=1)就可以,因为优化器会自动调整a,b的顺序。
3、再比如a=1 and b=2 and c>3 and d=4 如果建立(a,b,c,d)顺序的索引,d是用不到索引的,因为c字段是一个范围查询,它之后的字段就会停止匹配。
测试案例如下:
表名 |
总行数 |
原始大小 |
压缩大小 |
压缩率 |
test_event_combine ORDER BY (pid, logtime)
|
355820549 |
91.19 GiB |
6.83 GiB |
7 |
test_logtime_index ORDER BY (logtime, pid) |
355820549 |
91.19 GiB |
6.85 GiB |
7 |
建表语句如下(测试数据约3.5亿条):
CREATE TABLE test.test_event_combine
|
CREATE TABLE test. test_logtime_index |
查询效率对比:
|
SELECT count(pid) AS num |
1 rows in set. Elapsed: 0.014 sec. Processed 425.98 thousand rows, 1.70 MB (30.02 million rows/s., 120.10 MB/s.) | 1 rows in set. Elapsed: 0.989 sec. Processed 355.82 million rows, 1.42 GB (359.72 million rows/s., 1.44 GB/s.) |
结论:当order by pid字段在首位的时候,比在第二个字段快了近100倍左右。
执行计划如下:
执行计划 (order by pid,logtime)
|
执行计划 (order by logtime,pid) "Plan": { |
结论:观察执行计划发现都包含索引"Keys": ["pid"],但是Selected Granules两者差异比较大;看计划即使当字段在第二位走索引了,但是还是扫描的全表数据,可以理解为索引失效。
进行投影优化:
优化追加投影 ALTER TABLE test.test_logtime_index ADD PROJECTION norm_projection8 (SELECT * ORDER BY pid); ALTER TABLE test.test_logtime_index MATERIALIZE PROJECTION norm_projection8; 1 rows in set. Elapsed: 0.057 sec. Processed 483.26 thousand rows, 1.93 MB (8.41 million rows/s., 33.63 MB/s.)
观察执行计划:
执行计划: "Plan": { "Node Type": "Expression", "Description": "(Projection + Before ORDER BY)", "Plans": [ { "Node Type": "Aggregating", "Plans": [ { "Node Type": "Expression", "Description": "Before GROUP BY", "Plans": [ { "Node Type": "SettingQuotaAndLimits", "Description": "Set limits and quota after reading from storage", "Plans": [ { "Node Type": "ReadFromStorage", "Description": "MergeTree(with Normal projection norm_projection8)"
投影后表变化:
结论:观察执行计划,发现本次执行新增了投影"Description": "MergeTree(with Normal projection norm_projection8)"计划。但是表大小增加了一倍左右
标签:Node,String,pid,order,test,Type,mergetree,ClickHouse,Plans From: https://www.cnblogs.com/-courage/p/16802970.html