首页 > 数据库 >POSTGRESQL (PG) 6种索引类型介绍以及使用实例

POSTGRESQL (PG) 6种索引类型介绍以及使用实例

时间:2024-03-07 13:48:02浏览次数:35  
标签:Index rows POSTGRESQL .. t1 索引 PG id

Postgresql中主要支持6种类型的索引:BTREE、HASH、GiST、SP-GiSP、GIN、BRIN。可以根据实际的应用场景选择合适的索引,BTREE、HASH是比较常用的索引。

1. BTREE索引:
CREATE INDEX默认使用BTREE索引,适合按照顺序存储的数据进行比较查询和范围查询,查询优化器会优先考虑使用BTREE索引,如果涉及到以下任何一种操作:

1)<,<=,=,>,>=

2)以及这些操作的组合,比如between and,也可以使用BTREE。

3)在索引列上的IS NULL 或者IS NOT NULL也可以使用BTREE。

4)BTREE索引也可以用于模糊查询,但是仅限字符串开头是常量的情况下,比如 name LIKE ‘Jason%’,或者name ~ ’^Jason’。但是name LIKE ‘%Jason’是不能用的。

5)Min/Max聚集操作也可使用BTREE索引。

6)其实在merge join以及order by中,可以通过使用BTREE索引的有序性来减少sort带来的代价。

例子:

test=# create table t1 (id int, info text);
CREATE TABLE
test=# insert into t1 values(generate_series(1,100000), md5(random()::text));
INSERT 0 100000
test=# \d t1
Table "public.t1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
info | text | | |

test=# analyze t1;
ANALYZE
--不建立索引,默认使用顺序扫描
test=# explain select * from t1 where t1.id = 10007;
QUERY PLAN
------------------------------------------------------
Seq Scan on t1 (cost=0.00..2084.00 rows=1 width=37)
Filter: (id = 10007)
(2 rows)

test=# explain select * from t1 where t1.id >10007;
QUERY PLAN
----------------------------------------------------------
Seq Scan on t1 (cost=0.00..2084.00 rows=90249 width=37)
Filter: (id > 10007)
(2 rows)

test=# explain select * from t1 where t1.id > 10007 and t1.id < 12000;
QUERY PLAN
---------------------------------------------------------
Seq Scan on t1 (cost=0.00..2334.00 rows=2042 width=37)
Filter: ((id > 10007) AND (id < 12000))
(2 rows)
--建立BTREE索引
test=# create index on t1(id);
CREATE INDEX
test=# analyze t1;
ANALYZE
test=# explain select * from t1 where t1.id = 10007;
QUERY PLAN
---------------------------------------------------------------------
Index Scan using t1_id_idx on t1 (cost=0.29..8.31 rows=1 width=37)
Index Cond: (id = 10007)
(2 rows)
--下面例子中没有使用索引的原因是选择率太高,优化器会使用顺序扫描
test=# explain select * from t1 where t1.id >10007;
QUERY PLAN
----------------------------------------------------------
Seq Scan on t1 (cost=0.00..2084.00 rows=90103 width=37)
Filter: (id > 10007)
(2 rows)

test=# explain select * from t1 where t1.id > 10007 and t1.id < 12000;
QUERY PLAN
-------------------------------------------------------------------------
Index Scan using t1_id_idx on t1 (cost=0.29..83.73 rows=1972 width=37)
Index Cond: ((id > 10007) AND (id < 12000))
(2 rows)

test=# explain select * from t1 where t1.id >98765;
QUERY PLAN
-------------------------------------------------------------------------
Index Scan using t1_id_idx on t1 (cost=0.29..51.31 rows=1201 width=37)
Index Cond: (id > 98765)
(2 rows)

test=# \d t1
Table "public.t1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
info | text | | |
Indexes:
"t1_id_idx" btree (id)

test=# explain select * from t1 where t1.id between 10007 and 11000;
QUERY PLAN
------------------------------------------------------------------------
Index Scan using t1_id_idx on t1 (cost=0.29..43.79 rows=975 width=37)
Index Cond: ((id >= 10007) AND (id <= 11000))
(2 rows)

test=# explain select * from t1 where t1.id IS NULL;
QUERY PLAN
---------------------------------------------------------------------
Index Scan using t1_id_idx on t1 (cost=0.29..4.31 rows=1 width=37)
Index Cond: (id IS NULL)
(2 rows)

test=# create index on t1(info text_pattern_ops);
CREATE INDEX
test=# \d t1;
Table "public.t1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
info | text | | |
Indexes:
"t1_id_idx" btree (id)
"t1_info_idx" btree (info text_pattern_ops)

test=# analyze t1;
ANALYZE
--模糊查询使用索引
test=# explain select * from t1 where t1.info like '0123%';
QUERY PLAN
------------------------------------------------------------------------
Index Scan using t1_info_idx on t1 (cost=0.42..8.44 rows=10 width=37)
Index Cond: ((info ~>=~ '0123'::text) AND (info ~<~ '0124'::text))
Filter: (info ~~ '0123%'::text)
(3 rows)

test=# explain analyze select min(id) from t1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.33..0.34 rows=1 width=4) (actual time=0.143..0.143 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.29..0.33 rows=1 width=4) (actual time=0.114..0.116 rows=1 loops=1)
-> Index Only Scan using t1_id_idx on t1 (cost=0.29..3691.29 rows=100000 width=4) (actual time=0.107..0.107 rows=1 loops=1)
Index Cond: (id IS NOT NULL)
Heap Fetches: 1
Planning Time: 0.531 ms
Execution Time: 0.263 ms
(8 rows)

test=# explain analyze select max(id) from t1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.33..0.34 rows=1 width=4) (actual time=0.054..0.054 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.29..0.33 rows=1 width=4) (actual time=0.043..0.044 rows=1 loops=1)
-> Index Only Scan Backward using t1_id_idx on t1 (cost=0.29..3691.29 rows=100000 width=4) (actual time=0.040..0.041 rows=1 loops=1)
Index Cond: (id IS NOT NULL)
Heap Fetches: 1
Planning Time: 0.485 ms
Execution Time: 0.128 ms
(8 rows)

test=#

2. Hash索引:
只能处理简单的等值比较,当索引列涉及到等于操作比较时,优化器会考虑使用Hash索引。Hash索引是通过比较hash值来查找定位,如果hash索引列的数据重复度比较高,容易产生严重的hash冲突,从而降低查询效率,因此这种情况下,不适合hash索引。

CREATE INDEX idx_name ON table_name USING HASH (column_name);

例子:

test=# create table t2 (id int, info text);
CREATE TABLE
test=# insert into t2 values(generate_series(1,100000), md5(random()::text));
INSERT 0 100000
test=# create index on t2 using hash(id);
CREATE INDEX
test=# analyze t2;
ANALYZE
test=# \d t2;
Table "public.t2"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
info | text | | |
Indexes:
"t2_id_idx" hash (id)

test=# explain select * from t2 where id = 10008;
QUERY PLAN
---------------------------------------------------------------------
Index Scan using t2_id_idx on t2 (cost=0.00..8.02 rows=1 width=37)
Index Cond: (id = 10008)
(2 rows)
--非等于操作不会用到hash索引
test=# explain select * from t2 where id < 10008;
QUERY PLAN
---------------------------------------------------------
Seq Scan on t2 (cost=0.00..2084.00 rows=9826 width=37)
Filter: (id < 10008)
(2 rows)

test=# explain select * from t2 where id is NULL;
QUERY PLAN
------------------------------------------------------
Seq Scan on t2 (cost=0.00..1834.00 rows=1 width=37)
Filter: (id IS NULL)
(2 rows)

test=#

3. GiST索引
不是独立的索引类型,是一种架构或者索引模板,是一棵平衡二叉树。适用于多维数据类型和集合数据类型,和Btree索引类似,同样适用于其他的数据类型。GiST可以用来做位置搜索,如包含、相交、左边、右边等。和Btree索引相比,GiST多字段索引在查询条件中包含索引字段的任何子集都会使用索引扫描,而Btree索引只有查询条件包含第一个索引字段才会使用索引扫描。GiST索引特定操作符类型高度依赖于索引策略(操作符类)。GiST跟Btree索引相比,索引创建耗时较长,占用空间也比较大。

下面例子中建立了BTREE组合索引(a, b),如果SQL where条件中有a或者a,b都可以使用该组合索引,但是如果where条件中只有b,则无法使用索引。此时,GiST可以解决这种情况。

例子:

test=# create table t3(a bigint, b timestamp without time zone,c varchar(64));
CREATE TABLE ^
test=# insert into t3 values(generate_series(1,100000), now()::timestamp, md5(random()::text));
INSERT 0 100000
test=# create index on t3(a, b);
CREATE INDEX
test=# analyze t3;
ANALYZE
test=# \d t3
Table "public.t3"
Column | Type | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+---------
a | bigint | | |
b | timestamp without time zone | | |
c | character varying(64) | | |
Indexes:
"t3_a_b_idx" btree (a, b)

test=# explain select * from t3 where a = 10000;
QUERY PLAN
----------------------------------------------------------------------
Index Scan using t3_a_b_idx on t3 (cost=0.42..8.44 rows=1 width=49)
Index Cond: (a = 10000)
(2 rows)

test=# explain select * from t3 where b = '2022-11-18 17:50:29.245683';
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on t3 (cost=0.00..2281.00 rows=1 width=49)
Filter: (b = '2022-11-18 17:50:29.245683'::timestamp without time zone)
(2 rows)

test=# create extension btree_gist;
CREATE EXTENSION
test=# create index idx_t3_gist on t3 using gist(a,b);
CREATE INDEX
test=# analyze t3;
ANALYZE
test=# explain select * from t3 where a = 10000;
QUERY PLAN
----------------------------------------------------------------------
Index Scan using t3_a_b_idx on t3 (cost=0.42..8.44 rows=1 width=49)
Index Cond: (a = 10000)
(2 rows)

test=# explain select * from t3 where b = '2022-11-18 17:50:29.245683';
QUERY PLAN
-------------------------------------------------------------------------------
Index Scan using idx_t3_gist on t3 (cost=0.28..8.30 rows=1 width=49)
Index Cond: (b = '2022-11-18 17:50:29.245683'::timestamp without time zone)
(2 rows)

test=# explain select * from t3 where a = '10000';
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using idx_t3_gist on t3 (cost=0.28..8.30 rows=1 width=49)
Index Cond: (a = '10000'::bigint)
(2 rows)

test=# explain select * from t3 where a = '10000' or b = '2022-11-18 17:50:29.245683';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t3 (cost=8.58..12.59 rows=1 width=49)
Recheck Cond: ((a = '10000'::bigint) OR (b = '2022-11-18 17:50:29.245683'::timestamp without time zone))
-> BitmapOr (cost=8.58..8.58 rows=1 width=0)
-> Bitmap Index Scan on idx_t3_gist (cost=0.00..4.29 rows=1 width=0)
Index Cond: (a = '10000'::bigint)
-> Bitmap Index Scan on idx_t3_gist (cost=0.00..4.29 rows=1 width=0)
Index Cond: (b = '2022-11-18 17:50:29.245683'::timestamp without time zone)
(7 rows)

test=# explain select * from t3 where a = '10000' and b = '2022-11-18 17:50:29.245683';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Index Scan using t3_a_b_idx on t3 (cost=0.42..6.19 rows=1 width=49)
Index Cond: ((a = '10000'::bigint) AND (b = '2022-11-18 17:50:29.245683'::timestamp without time zone))
(2 rows)

test=#

4. SP-GiST索引
和GiST类似,但是是一棵不平衡树,支持多维和海量数据,把空间分割成互不相交的部分。SP-GiST适用于空间可以递归分割成不相交区域的结构,包括四叉树、k-D树和基数树。

5. GIN索引
倒排序索引,适合于包含多个组成值的数据,比如数组,全文检索等。用来保存一个键值对的集合,支持用户定义的索引策略,对于不同的索引策略,可以使用不同的操作符。

键值对(Key,postion list):其中Key是一个键值,而postion list是包含Key的位置值。比如('Bob','10:25 14:3 29:5') 就表示关键字'Bob'在这些位置(元组TID)上存在。 当我们用关键字'Bob'去查询的时候,一下就定位到包含关键字的元组有这三个。

使用方法如下:

test=# create table t4(id int, info text);
CREATE TABLE
test=# insert into t4 values(generate_series(1,10000), md5(random()::text));
INSERT 0 10000
test=# create index idx_t4_gin on t4 using gin(to_tsvector('english',info));
CREATE INDEX
test=# analyze t4;
ANALYZE
test=# \d t4
Table "public.t4"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
info | text | | |
Indexes:
"idx_t4_gin" gin (to_tsvector('english'::regconfig, info))

test=# explain select * from t4 where to_tsvector('english', info) @@ plainto_tsquery( 'hello');
QUERY PLAN
-------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t4 (cost=12.25..16.77 rows=1 width=37)
Recheck Cond: (to_tsvector('english'::regconfig, info) @@ plainto_tsquery('hello'::text))
-> Bitmap Index Scan on idx_t4_gin (cost=0.00..12.25 rows=1 width=0)
Index Cond: (to_tsvector('english'::regconfig, info) @@ plainto_tsquery('hello'::text))
(4 rows)

6. BRIN索引
块范围索引,它将数据在磁盘上的block按照一定的数目进行分组,分组之后,计算每组的取值范围。在查找数据时,会遍历这些取值范围,排除掉不在范围之内的分组。BRIN索引适用于存储流式数据日志。例如:按照时间插入的数据,由于数据是按照时间插入,因此数据块上记录的范围信息很少会出现交叉情况,索引过滤后需要比较的数据块也会少很多;反之,如果数据交叉严重,通过索引无法过滤掉任何一个数据块时,操作起来会比全表扫描更加耗时。

test=# create table t5(id int, name text);
CREATE TABLE
test=# insert into t5 values(generate_series(1,100000), md5(random()::text));
INSERT 0 100000
test=# create index idx_t5_brin on t5 using brin(id);
CREATE INDEX
test=# analyze t5;
ANALYZE
test=# \d t5
Table "public.t5"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
name | text | | |
Indexes:
"idx_t5_brin" brin (id)

test=# explain select * from t5 where id > 98765;
QUERY PLAN
-------------------------------------------------------------------------------
Bitmap Heap Scan on t5 (cost=12.33..1024.91 rows=1199 width=37)
Recheck Cond: (id > 98765)
-> Bitmap Index Scan on idx_t5_brin (cost=0.00..12.03 rows=14286 width=0)
Index Cond: (id > 98765)
(4 rows)

test=# explain analyze select * from t5 where id > 98765;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t5 (cost=12.33..1024.91 rows=1199 width=37) (actual time=2.033..2.408 rows=1235 loops=1)
Recheck Cond: (id > 98765)
Rows Removed by Index Recheck: 6605
Heap Blocks: lossy=66
-> Bitmap Index Scan on idx_t5_brin (cost=0.00..12.03 rows=14286 width=0) (actual time=0.043..0.043 rows=1280 loops=1)
Index Cond: (id > 98765)
Planning Time: 0.115 ms
Execution Time: 2.545 ms
(8 rows)
test=#

标签:Index,rows,POSTGRESQL,..,t1,索引,PG,id
From: https://www.cnblogs.com/wangtiantian/p/18058716

相关文章

  • elasticsearch——搜索引擎工具
    创建一个网络,到时候ES要和kibana(可视化工具)放同一个网络(也可以直接用docker-compose一键部署,不过有时不需要两个都开,所以逐一部署吧) dockernetworkcreatees-net然后去DockerHub找文档,dockerpull一下ES和kibana,或者网上找镜像包单点部署ESdockerrun-d\ --namees\......
  • 大厂的视频推荐索引构建解决方案
    关注我,紧跟本系列专栏文章,咱们下篇再续!作者简介:魔都技术专家兼架构,多家大厂后端一线研发经验,各大技术社区头部专家博主。具有丰富的引领团队经验,深厚业务架构和解决方案的积累。负责:中央/分销预订系统性能优化活动&优惠券等营销中台建设交易平台及数据中台等架构和开发设计......
  • Java连接PostgreSQL数据库测试
    importjava.sql.DriverManager;importjava.sql.Connection;importjava.sql.SQLException;importjava.sql.ResultSet;importjava.sql.Statement;publicclassPG{publicstaticvoidmain(String[]args){System.out.println("PostgreSQLJDBC......
  • 技术笔记(2)MMORPG架构
    技术笔记(2)MMORPG架构希望实现的功能或目标:一个功能完整的接近商业案例的MMORPG游戏项目搭建起该游戏项目的基本架构‍学习笔记:IOCContainer类用以保存所有层级以及各个模块的实例实例字典​privateDictionary<Type,object>instancesDict=newDictiona......
  • 基于FPGA的二维DCT变换和逆变换verilog实现,包含testbench
    1.算法运行效果图预览    数据导入到matlab显示图像 2.算法运行软件版本vivado2019.2 matlab2022a 3.算法理论概述       离散余弦变换(DiscreteCosineTransform,DCT)是一种广泛应用于图像和信号处理领域的变换技术。在图像处理中,DCT常被用于图像压......
  • nginx反向代理服务器实现postgreSQL
    可访问的地址:192.168.1.200:9856不可访问的地址:192.168.214.133:32222(pg库的地址)在192.168.1.200服务器上安装nginx,设置一个监听的端口(9856),将地址二192.168.214.133:32222映射到这个端口(版本要大于nginx1.9.xxx,stream和http是同级关系,在Navicat上通过连接主机-192.168.1......
  • 索引
    索引是什么?索引是对数据库表中一列或多列的值进行排序的一种结构。MySQL索引的建立对于MySQL的高效运行是很重要的,索引可以大大提高MySQL的检索速度。B+树:是通过二叉查找树,再由平衡二叉树,B树演化而来    创建索引的三种方式#方式一:createtablet5(idint......
  • mysql没走索引原因分析(转)
    原文:https://juejin.cn/post/71149875593818603821、问题工作中,经常遇到这样的问题,我明明在MySQL表上面加了索引,为什么执行SQL查询的时候却没有用到索引?同一条SQL有时候查询用到了索引,有时候却没用到索引,这是咋回事?原因可能是索引失效了,失效的原因有以下几种,看你有没有踩过类......
  • 网络下载的webp 图片格式,批量转化为jpg格式
    importosfromPILimportImage#获取当前脚本文件所在的文件夹路径script_dir=os.path.dirname(os.path.abspath(__file__))#默认使用当前脚本文件所在的文件夹作为路径folder_path=script_dir#遍历文件夹中的文件forfilenameinos.listdir(folder_path):......
  • 网络下载的webp 图片格式,批量转化为jpg格式
    importosfromPILimportImage#获取当前脚本文件所在的文件夹路径script_dir=os.path.dirname(os.path.abspath(__file__))#默认使用当前脚本文件所在的文件夹作为路径folder_path=script_dir#遍历文件夹中的文件forfilenameinos.listdir(folder_path):......