openGauss索引优化以及虚拟索引

标签：index ysl col1 索引虚拟 tab openGauss SELECT

一、索引推荐

1、测试数据导入

gsql -d database_test -p 26000 -U joe -W Mysql@123456 -r
CREATE TABLE tab_ysl_1 (col1 int, col2 int, col3 text);
INSERT INTO tab_ysl_1 VALUES(generate_series(1, 3000),generate_series(1, 3000),repeat( chr(int4(random()*26)+65),4));
ANALYZE tab_ysl_1;
CREATE TABLE tab_ysl_2 (col1 int, col2 int);
INSERT INTO tab_ysl_2 VALUES(generate_series(1, 1000),generate_series(1, 1000));
ANALYZE tab_ysl_2;

2、gs_index_advise函数

1.测试where
SELECT  * FROM gs_index_advise('SELECT * FROM tab_ysl_1 WHERE col1 = 10');
 schema |   table   | column | indextype
--------+-----------+--------+-----------
 joe    | tab_ysl_1 | col1   |
(1 row)

2.测试join
SELECT  * FROM gs_index_advise('SELECT * FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col1 = tab_ysl_2.col1');
 schema |   table   | column | indextype
--------+-----------+--------+-----------
 joe    | tab_ysl_1 | col1   |
 joe    | tab_ysl_2 |        |
(2 rows)

3.测试多表
SELECT  * FROM gs_index_advise('SELECT count(*), tab_ysl_2.col1 FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col2 = 
tab_ysl_2.col2 WHERE tab_ysl_2.col2 > 2 GROUP BY tab_ysl_2.col1 ORDER BY tab_ysl_2.col1');
 schema |   table   | column | indextype
--------+-----------+--------+-----------
 joe    | tab_ysl_1 | col2   |
 joe    | tab_ysl_2 | col1   |
(2 rows)

4.测试order by
SELECT  * FROM gs_index_advise('SELECT *, col2 FROM tab_ysl_1 ORDER BY 1, 3');
 schema |   table   |  column   | indextype
--------+-----------+-----------+-----------
 joe    | tab_ysl_1 | col1,col3 |
(1 row)
SELECT * FROM gs_index_advise('SELECT * FROM tab_ysl_1 WHERE col1 > 10 ORDER BY 1,col2');
 schema |   table   |  column   | indextype
--------+-----------+-----------+-----------
 joe    | tab_ysl_1 | col1,col2 |
(1 row)

5.测试过长字符串
SELECT  * FROM gs_index_advise('SELECT * FROM tab_ysl_1 where col3 in (''aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaa'',''bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb'',''cccccccccccccccccccccccccccccccccccccc
c'',''ddddddddddddddddddddddddddddddddddddddd'',''ffffffffffffffffffffffffffffffffffffffff'',''gggggggggggggggggggg
ggggggggggggggggggggggggggggggg'',''ttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt'',''vvv
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv'',''ggmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
mmm'')');
 schema |   table   | column | indextype
--------+-----------+--------+-----------
 joe    | tab_ysl_1 | col3   |
(1 row)

3、Workload级别索引推荐

这种方式可以针对多条SQL,可以将待优化的SQL写到文件里，通过脚本获得推荐索引。
脚本目录在安装目录的bin/dbmind/index_advisor下边，我的目录为
/opt/gaussdb/app/bin/dbmind/index_advisor/index_advisor_workload.py
将待优化的SQL放到文件里
[omm@node1 index_advisor]$ cat 1.sql
SELECT * FROM tab_ysl_1 WHERE col1 = 10;
SELECT count(*), tab_ysl_2.col1 FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col2 = tab_ysl_2.col2 WHERE tab_ysl_2.col2 > 2 GROUP BY tab_ysl_2.col1 ORDER BY tab_ysl_2.col1;
SELECT * FROM tab_ysl_1 join tab_ysl_2 on tab_ysl_1.col1 = tab_ysl_2.col1;

使用如下方式调用脚本，可以批量获取推荐索引，26000为我的数据库端口，ysla为我的数据库名，1.sql为我待优化的SQL存放的文件
[omm@node1 index_advisor]$ pwd
/opt/gaussdb/app/bin/dbmind/index_advisor
[omm@node1 index_advisor]$ python3 ./index_advisor_workload.py 26000 database_test 1.sql
############################################################## Generate candidate indexes
table: tab_ysl_1 columns: col1
table: tab_ysl_1 columns: col2
table: tab_ysl_2 columns: col1
############################################################### Determine optimal indexes
create index ind0 on tab_ysl_1(col1);

4、索引效率查看

1、未优化的情况下
cat 1.sql
SELECT * FROM tab_ysl_1 WHERE col1 = 10;
[omm@gsdb01 ~]$ time gsql -d database_test -p 26000 -U joe -W Mysql@123456 -f 1.sql
 col1 | col2 | col3
------+------+------
   10 |   10 | HHHH
(1 row)
total time: 1  ms
real    0m0.020s
user    0m0.007s
sys     0m0.001s

2、分析并创建索引
[omm@node1 index_advisor]$ python3 ./index_advisor_workload.py 26000 database_test 1.sql
############################################################## Generate candidate indexes
table: tab_ysl_1 columns: col1
############################################################### Determine optimal indexes
create index ind0 on tab_ysl_1(col1);
通过Index-advisor获取推荐索引。并创建索引
gsql -d database_test -p 26000 -U joe -W Mysql@123456 -r -c "create index ind0 on tab_ysl_1(col1);"

[omm@gsdb01 ~]$ time gsql -d database_test -p 26000 -U joe -W Mysql@123456 -f 1.sql
 col1 | col2 | col3
------+------+------
   10 |   10 | HHHH
(1 row)
total time: 0  ms
real    0m0.019s
user    0m0.006s
sys     0m0.002s
#短了一丢丢，可能是数据量太少了，不明显

5、总结

1、未添加索引的查询效率
time gsql -d tpch -p 26000 -c "select * from lineitem where 1_orderkey < 100 and 1_suppkey > 50;"
real 0m4.916s
user 0m0.014s
sys  0m0.001s

2、使用索引推荐函数gs_index_advise获取优化建议
select * from gs_index_advise('select * from lineitem where 1_orderkey < 100 and 1_suppkey > 50;')
table    |  column
lineitem |  (1_orderkey)

3、创建索引
create index idx1 on lineitem(1_orderkey);

4、查看优化结果
time gsql -d tpch -p 26000 -c "select * from lineitem where 1_orderkey < 100 and 1_suppkey > 50;"

单索引推荐：适用于表中数据量大的情况，数据量过小不会进行推荐。
（1）当where中的查询条件只有一个的时候，推荐单一索引，如只有id在where中，只推荐id为索引；当where中的查询条件有多个的时候推荐多重索
引，如id，name在where中被当做条件，则一起被推荐为联合索引，但是如果同时存在id、person_id则默认推荐id（ 目前不知道原因）
（2）当query中除了where这个语句时，还存在order by 和 group by等条件时，将where、order by、group by中的属性全部作为联合索引进行
推荐。
（3）使用like模糊查询时或精准查询都不对该属性进行索引建立，并且=属性一定给予索引推荐建立。
（4）当query中的条件过多时存在的属性也超过三个时，依旧推荐联合属性是在三个以上，会不会导致推荐索引过多从而性能下降，这个有待商榷，最
好推荐索引中的属性在三个以内最好。不过这个可能需要通过DRL来学习，判断究竟选择一条query中的哪几个属性来建立索引。

二、虚拟索引

一般在加索引时，会堵塞DML（不过PG支持并发加索引，不堵塞DML）。只有索引真正能起到优化作用，我们建立索引才是有意义的。虚拟索引是一个很有用的东西，没有副作用，只是虚拟的索引，建立虚拟索引后，可以通过EXPLAIN来查看加索引后的成本估算，判断是否加索引COST会降低。

可以用虚拟索引检验索引的效果，根据效果可选择是否创建真实的索引优化查询。

1、清理之前的索引
gsql -d database_test -p 26000 -U joe -W Mysql@123456 -r
\d+ tab_ysl_1
drop index ind0;
\d+ tab_ysl_1
                        Table "joe.tab_ysl_1"
 Column |  Type   | Modifiers | Storage  | Stats target | Description
--------+---------+-----------+----------+--------------+-------------
 col1   | integer |           | plain    |              |
 col2   | integer |           | plain    |              |
 col3   | text    |           | extended |              |
Has OIDs: no
Options: orientation=row, compression=no

2、#测试建立虚拟索引（hypopg_create_index）
SELECT * FROM hypopg_create_index('CREATE INDEX ON tab_ysl_1(col1)');
 indexrelid |          indexname
------------+-----------------------------
      54410 | <54410>btree_tab_ysl_1_col1
(1 row)

3、显示所有创建的虚拟索引信息（enable_hypo_index）
select * from hypopg_display_index();
          indexname          | indexrelid |   table   | column
-----------------------------+------------+-----------+--------
 <54410>btree_tab_ysl_1_col1 |      54410 | tab_ysl_1 | (col1)
(1 row)

4、执行分析
set enable_hypo_index = on;explain SELECT * FROM tab_ysl_1 WHERE col1 = 100;
SET
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Index Scan using <54410>btree_tab_ysl_1_col1 on tab_ysl_1  (cost=0.00..8.27 rows=1 width=13)
   Index Cond: (col1 = 100)
(2 rows)

5、测试删除指定虚拟索引（hypopg_display_index）
使用函数hypopg_drop_index删除指定oid的虚拟索引
 select * from hypopg_drop_index(54410);
 hypopg_drop_index
-------------------
 t
(1 row)

6、再次查看虚拟索引
database_test=> select * from hypopg_display_index();
 indexname | indexrelid | table | column
-----------+------------+-------+--------
(0 rows)

7、#使用函数hypopg_reset_index一次性清除所有创建的虚拟索引
SELECT * FROM hypopg_reset_index();
 hypopg_reset_index
--------------------

标签：index,ysl,col1,索引,虚拟,tab,openGauss,SELECT
From： https://blog.51cto.com/u_13236892/6178927