首页 > 其他分享 >KingbaseES V8R6 索引膨胀

KingbaseES V8R6 索引膨胀

时间:2023-01-09 16:14:56浏览次数:46  
标签:index V8R6 idx 索引 pg test KingbaseES split

索引膨胀

对于索引,随着业务不断的增删改,会造成膨胀,尤其Btree索引,也会涉及索引分裂、合并等,导致索引访问效率降低、维护成本增加。另外,索引页的复用与HEAP PAGE不一样,因为索引的内容是有序结构,只有符合顺序的ITEM才能插入对应的PAGE中,不像HEAP TUPLE,只要有空间就可以插入。index page无论在任何位置,都不能从磁盘删除,因此索引变大后,不能回收空间,除非vacuum full。所以索引膨胀后,通常需要重建索引来回收索引空间。

此外,对于B树索引,新构建的索引比多次更新的索引访问速度稍快,因为逻辑上相邻的页面通常在新构建索引中也是物理上相邻的。为了提高访问速度,定期重新B_Tree 索引可能是值得的。

对于重建索引,REINDEX在任何情况下都可以安全方便地使用。默认情况下,该命令需要ACCESS EXCLUSIVE锁。可以使用CONCURRENTLY选项创建索引,该选项只需要SHARE UPDATE EXCLUSIV锁,不阻塞读写。

索引膨胀的原因:

1.大量删除发生后,导致索引页面稀疏,降低了索引使用效率。

2.长时间运行的事务,阻止了vacuum对表的清理工作,因而导致页面稀疏状态一直保持。

3.索引字段乱序写入,导致索引频繁分裂,使得索引页并不是百分百填满,所以膨胀。

查询获取每个表的行数、索引以及有关这些索引的一些信息:

SELECT
    pg_class.relname,
    pg_size_pretty(pg_class.reltuples::bigint)            AS rows_in_bytes,
    pg_class.reltuples                                    AS num_rows,
    COUNT(*)                                              AS total_indexes,
    COUNT(*) FILTER ( WHERE indisunique)                  AS unique_indexes,
    COUNT(*) FILTER ( WHERE indnatts = 1 )                AS single_column_indexes,
    COUNT(*) FILTER ( WHERE indnatts IS DISTINCT FROM 1 ) AS multi_column_indexes
FROM
    pg_namespace
    LEFT JOIN pg_class ON pg_namespace.oid = pg_class.relnamespace
    LEFT JOIN pg_index ON pg_class.oid = pg_index.indrelid
WHERE
    pg_namespace.nspname = 'public' AND
    pg_class.relkind = 'r'
GROUP BY pg_class.relname, pg_class.reltuples
ORDER BY pg_class.reltuples DESC;
SELECT
    t.schemaname,
    t.tablename,
    c.reltuples::bigint                            AS num_rows,
    pg_size_pretty(pg_relation_size(c.oid))        AS table_size,
    psai.indexrelname                              AS index_name,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END  AS "unique",
    psai.idx_scan                                  AS number_of_scans,
    psai.idx_tup_read                              AS tuples_read,
    psai.idx_tup_fetch                             AS tuples_fetched
FROM
    pg_tables t
    LEFT JOIN pg_class c ON t.tablename = c.relname
    LEFT JOIN pg_index i ON c.oid = i.indrelid
    LEFT JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid
WHERE
    t.schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY 1, 2;

查找重复索引,查找具有相同列集、相同操作类、表达式和谓词的多个索引,但需要人为判断需要删除的重复项:

SELECT pg_size_pretty(sum(pg_relation_size(idx))::bigint) as size,
       (array_agg(idx))[1] as idx1, (array_agg(idx))[2] as idx2,
       (array_agg(idx))[3] as idx3, (array_agg(idx))[4] as idx4
FROM (
    SELECT indexrelid::regclass as idx, (indrelid::text ||E'\n'|| indclass::text ||E'\n'|| indkey::text ||E'\n'||coalesce(indexprs::text,'')||E'\n' || coalesce(indpred::text,'')) as key
    FROM pg_index) sub
GROUP BY key HAVING count(*)>1
ORDER BY sum(pg_relation_size(idx)) DESC;

查找未使用的索引:

select 
    indexrelid::regclass as index, relid::regclass as table 
from 
    pg_stat_user_indexes 
    JOIN pg_index USING (indexrelid) 
where 
    idx_scan = 0 and indisunique is false;

有些场景,重建索引后,索引就变小了。通常这种情况是索引字段乱序写入,导致索引频繁分裂,使得索引页并不是百分百填满,密度低,索引页面浪费。

索引碎片模拟

乱序写入:

test=# create table t_split(id int);  
CREATE TABLE  
test=# create index idx_split on t_split (id);  
CREATE INDEX  
test=# insert into t_split select random()*1000000 from generate_series(1,1000000);  
INSERT 0 1000000  
test=# \di+ idx_split   
                          List of relations
 Schema |   Name    | Type  | Owner  |  Table  | Size  | Description
--------+-----------+-------+--------+---------+-------+-------------
 public | idx_split | index | system | t_split | 30 MB |
 
(1 row)

顺序写入:

test=# truncate t_split ;  
TRUNCATE TABLE  
test=# \di+ idx_split   
                            List of relations
 Schema |   Name    | Type  | Owner  |  Table  |    Size    | Description
--------+-----------+-------+--------+---------+------------+-------------
 public | idx_split | index | system | t_split | 8192 bytes |

(1 row)  
  
test=# insert into t_split select generate_series(1,1000000);  
INSERT 0 1000000  
test=# \di+ idx_split   
                          List of relations
 Schema |   Name    | Type  | Owner  |  Table  | Size  | Description
--------+-----------+-------+--------+---------+-------+-------------
 public | idx_split | index | system | t_split | 22 MB |

(1 row)

先写入数据,后建索引:

test=# drop index idx_split ;  
DROP INDEX  
test=# create index idx_split on t_split (id);  
CREATE INDEX  
test=# \di+ idx_split   
                           List of relations
 Schema |   Name    | Type  | Owner  |  Table  | Size  | Description
--------+-----------+-------+--------+---------+-------+-------------
 public | idx_split | index | system | t_split | 22 MB |
(1 row)


业务运行久了,不断的增删改,也会导致索引碎片:

test=# create table test(id int);
CREATE TABLE
test=# insert into test values(generate_series(1,1000000));
INSERT 0 1000000
test=# create index idx_fragmented on test(id);
CREATE INDEX
CREATE EXTENSION kbstattuple;
刚刚创建的索引没有碎片:
test=# \x
Expanded display is on.
test=# SELECT * FROM pgstatindex('idx_fragmented');
-[ RECORD 1 ]------+---------
version            | 4
tree_level         | 2
index_size         | 22609920
root_block_no      | 289
internal_pages     | 11
leaf_pages         | 2748
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 89.93
leaf_fragmentation | 0


leaf_fragmentation的碎片率是33.33%:
test=# insert into test values(generate_series(1,1000000));
INSERT 0 1000000
test=# SELECT * FROM pgstatindex('idx_fragmented');
-[ RECORD 1 ]------+---------
version            | 4
tree_level         | 2
index_size         | 67846144
root_block_no      | 289
internal_pages     | 39
leaf_pages         | 8242
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 60.06
leaf_fragmentation | 33.33


reindex之后,即可回收空间,减少碎片。

test=# reindex index idx_fragmented;
REINDEX
test=# \di+ idx_fragmented
List of relations
-[ RECORD 1 ]---------------
Schema      | public
Name        | idx_fragmented
Type        | index
Owner       | system
Table       | test
Size        | 43 MB
Description |

test=# SELECT * FROM pgstatindex('idx_fragmented');
-[ RECORD 1 ]------+---------
version            | 4
tree_level         | 2
index_size         | 45236224
root_block_no      | 208
internal_pages     | 26
leaf_pages         | 5495
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 90.01
leaf_fragmentation | 0

总结

通过以上方法监控索引膨胀,以及索引碎片情况,及时对索引reindex 进行碎片优化,建议不要在一个表上建太多索引,准确评估经常update的列和经常select的列,以便创建合适的索引。

标签:index,V8R6,idx,索引,pg,test,KingbaseES,split
From: https://www.cnblogs.com/kingbase/p/16942027.html

相关文章

  • KingbaseES V8R6 中unlogged表
    前言KingbaseESV8R6有一种表称为unlogged,在该表新建的索引也属于unlogged。和普通表的区别是,对该表进行DML操作时候不将该表的变更记录变更写入到wal文件中。在数据库异常......
  • 你有想过,如何用Hadoop实现【倒排索引】?
    写在前面: 博主是一名大数据的初学者,昵称来源于《爱丽丝梦游仙境》中的Alice和自己的昵称。作为一名互联网小白,​​写博客一方面是为了记录自己的学习历程,一方面是希望能够......
  • MySQL21 - 索引
    索引什么是索引索引相当于一本书的目录,为了缩小扫描范围而存在、提高查询效率的一种数据结构一张表的一个字段可以添加索引,多个字段也可以联合起来添加索引在任何数据......
  • 倒排索引的 JAVA 简单实现
      倒排索引的简单JAVA实现,当玩具其实都很粗糙,简单实现下原理:publicclassIntertedIndex{//倒排索引privateMap<String,List<String>>indexMa......
  • ubuntu系统更新或者卸载软件导致索引失败的抢救方法
    #现将info文件夹更名sudomv/var/lib/dpkg/info/var/lib/dpkg/info_old#再新建一个新的info文件夹sudomkdir/var/lib/dpkg/info#更新安装......
  • Reduce the indexing time and CPU load with pre-built JDK shared indexes(idea打开
    Intellijidea出现如下提示:ReducetheindexingtimeandCPUloadwithpre-builtJDKsharedindexes解决方法: idea为了能够更快的使用共享索引,会下载JDK和Maven库的共享......
  • MySQl中的冷门索引类型
    MySQL中的冷门索引类型一、Hash索引概念​基于哈希表实现,只有匹配所有列的查询才有效。对于每一行数据,存储引擎都会对所有索引列计算一个哈希码,哈希码是一个较小的......
  • Sql Server 索引使用情况
    --查询数据库索引碎片状况usedbname;SELECTOBJECT_NAME(ind.OBJECT_ID)ASTableName, ind.nameASIndexName,indexstats.index_type_descASIndexType, index......
  • (四)elasticsearch 源码之索引流程分析
    1.概览前面我们讨论了es是如何启动,本文研究下es是如何索引文档的。下面是启动流程图,我们按照流程图的顺序依次描述。  其中主要类的关系如下:  2.索引流程......
  • SQLSERVER 的复合索引和包含索引到底有啥区别?
    一:背景1.讲故事在SQLSERVER中有非常多的索引,比如:聚集索引,非聚集索引,唯一索引,复合索引,Include索引,交叉索引,连接索引,奇葩索引等等,当索引多了之后很容易傻傻的分不清,比如......