全文检索(Full-text Search)是指计算机索引程序通过扫描文章中的每一个词,对每一个词建立一个索引,指明该词在文章中出现的次数和位置,当用户查询时,检索程序就根据事先建立的索引进行查找,并将查找的结果反馈给用户的检索方式。
在全文搜索的世界中,存在着几个主流工具,主要有:
(1) Apache Lucene
(2) ElasticSearch
(3) Solr
(4) Ferret
ElasticSearch (ES) 是一个分布式的 RESTful 风格的全文搜索和数据分析引擎,能胜任上百个服务节点的扩展,并支持 PB 级别的结构化或者非结构化数据。
ElasticSearch 建立在全文搜索引擎库 Apache Lucene 基础之上,用 Java 编写的,它的内部使用 Lucene 做索引与搜索,但是它的目的是使全文检索变得简单,通过隐藏 Lucene 的复杂性,取而代之的提供一套简单一致的 RESTful API。
ElasticSearch 被用作全文检索、结构化搜索、分析以及这三个功能的组合,以下是具体的应用案例:
(1) Wikipedia: 使用 ElasticSearch 提供带有高亮片段的全文搜索,还有 search-as-you-type 和 did-you-mean 的建议。
(2) 卫报: 使用 ElasticSearch 将网络社交数据结合到访客日志中,实时的给它的编辑们提供公众对于新文章的反馈。
(3) Stack Overflow: 将地理位置查询融入全文检索中去,并且使用 more-like-this 接口去查找相关的问题与答案。
(4) GitHub: 使用 ElasticSearch 对 1300 亿行代码进行查询。
ElasticSearch 的核心概念:
(1) Cluster(集群):一个集群由一个唯一的名字标志,默认为 “ElasticSearch”。集群名称非常重要,具体相同集群名的节点才会组成一个集群,集群名称可以在配置文件中指定;
(2) Node(节点):存储集群的数据,参与集群的索引和搜索功能。像集群有名字,节点也有自己的名称,默认在启动时会以一个随机的 UUID 的前七个字符作为节点的名字,可以为其指定任意的名字。通过集群名在网络中发现同伴组成集群;
(3) Index(索引):一个索引是一个文档的集合(等同于 solr 中的集合)。每个索引有唯一的名字,通过这个名字来操作它。一个集群中可以有任意多个索引;
(4) Type(类型):指在一个索引中,可以索引不同类型的文档,如用户数据、博客数据。从 6.0.0 版本起已废弃,一个索引中只存放一类数据;
(5) Document(文档):被索引的一条数据,索引的基本信息单元,以 JSON 格式来表示;
(6) Fields(字段):每个 Document 都类似一个 JSON 结构,它包含了许多字段,每个字段都有其对应的值,多个字段组成了一个 Document,可以类比关系型数据库数据表中的字段;
(7) Shard(分片):在创建一个索引时可以指定分成多少个分片来存储。每个分片本身也是一个功能完善且独立的 “索引”,可以被放置在集群的任意节点上。分片的好处:允许我们水平切分/扩展容量,可在多个分片上进行分布式的、并行的操作,提高系统的性能和吞吐量。
(8) Near Realtime(NRT,近实时):数据提交索引后,立马就可以搜索到;
ElasticSearch 和关系型数据库各重要概念具体对应关系如下表所示:
关系型数据库 (RDBMS) | EasticSearch (ES) |
Database (库) | Index(索引) |
Table(表) | Type(类型,从 6.0.0 版本起已废弃) |
Schema(结构、定义) | Mapping(映射) |
Row(数据行) | Document(文档) |
Column(数据列) | Field(字段) |
SQL(查询等语句) | DSL(查询等语句) |
ElasticSearch:https://elastic.co/downloads/elasticsearch
ElasticSearch GitHub: https://github.com/elastic/elasticsearch
1. EasticSearch 安装配置
1) Windows 10 下安装
访问 https://elastic.co/downloads/elasticsearch 下载 elasticsearch-7.10.2-windows-x86_64.zip,保存到目录 C:\Applications\Java\,解压后目录是 elasticsearch-7.10.2。
进入 C:\Applications\Java\elasticsearch-7.10.2\bin 目录,双击 elasticsearch.bat 运行,自动启动命令行控制台,显示如下:
... [2021-03-28T09:42:13,878][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [CHSHL01096TKUAN] adding index lifecycle policy [watch-history-ilm-policy] [2021-03-28T09:42:14,007][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [CHSHL01096TKUAN] adding index lifecycle policy [ilm-history-ilm-policy] [2021-03-28T09:42:14,116][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [CHSHL01096TKUAN] adding index lifecycle policy [slm-history-ilm-policy] [2021-03-28T09:42:14,367][INFO ][o.e.l.LicenseService ] [CHSHL01096TKUAN] license [76dad8a9-b5b1-47ea-aece-fe8d22925547] mode [basic] - valid [2021-03-28T09:42:14,368][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [CHSHL01096TKUAN] Active license is now [BASIC]; Security is disabled
注:运行 elasticsearch.bat 命令前,要先设置好系统的 JAVA_HOME 环境变量。
ElasticSearch 的 Web 服务默认启动绑定主机是 localhost,端口是 9200,可以在 conf\elasticsearch.yml 中修改主机和端口。
浏览器访问 http://localhost:9200,显示结果如下:
{ "name" : "CHSHL01096TKUAN", "cluster_name" : "elasticsearch", "cluster_uuid" : "QagiO3dcTBSVNYl10ibjPQ", "version" : { "number" : "7.10.2", "build_flavor" : "default", "build_type" : "zip", "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9", "build_date" : "2021-01-13T00:42:12.435326Z", "build_snapshot" : false, "lucene_version" : "8.7.0", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
注: CHSHL01096TKUAN 是 Windows 主机的 hostname。
2) Ubuntu 20.04 下安装
$ cd ~/apps # 在 /home/xxx 目录(Linux 用户根目录)下手动创建 apps 目录
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-linux-x86_64.tar.gz
$ tar -zvxf elasticsearch-7.10.2-linux-x86_64.tar.gz # 解压后目录是 elasticsearch-7.10.2
$ cd elasticsearch-7.10.2/bin
$ ./elasticsearch
...
注:运行 ./elasticsearch 命令前,要先设置好系统的 JAVA_HOME 环境变量。
# 另开一个命令行窗口(控制台),运行 curl 访问 http://localhost:9200
$ curl http://localhost:9200
{ "name" : "Ubuntu20-04", "cluster_name" : "elasticsearch", "cluster_uuid" : "H0Nlh1UiRDK-BSZokqKzdw", "version" : { "number" : "7.10.2", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9", "build_date" : "2021-01-13T00:42:12.435326Z", "build_snapshot" : false, "lucene_version" : "8.7.0", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
注: Ubuntu20-04 是 Unbuntu 主机的 hostname。
2. Kibana 安装配置
Kibana 是一个开源的分析和可视化平台,Kibana 提供搜索、查看和与存储在 ElasticSearch 索引中的数据进行交互的功能。开发者或运维人员可以轻松地执行高级数据分析,并在各种图表、表格和地图中可视化数据。
Kibana:https://www.elastic.co/downloads/kibana
Kibana GitHub:https://github.com/elastic/kibana
1) Windows 10 下安装
下载 https://artifacts.elastic.co/downloads/kibana/kibana-7.10.2-windows-x86_64.zip,保存到目录 C:\Applications\Java\,解压后修改目录为 kibana-7.10.2。
进入 C:\Applications\Java\kibana-7.10.2\bin 目录,双击 kibana.bat 运行,自动启动命令行控制台,显示如下:
... log [09:56:17.150] [info][kibana-monitoring][monitoring][monitoring][plugins] Starting monitoring stats collection log [09:56:17.152] [info][plugins][watcher] Your basic license does not support watcher. Please upgrade your license. log [09:56:19.580] [info][listening] Server running at http://localhost:5601 log [09:56:24.956] [info][server][Kibana][http] http server running at http://localhost:5601 log [11:37:14.605] [error][plugins][taskManager][taskManager] [Task Poller Monitor]: Observable Monitor: Hung Observable restarted after 33000ms of inactivity
Kibana 的 Web 服务默认启动绑定主机是 localhost,端口是 5601,可以在 conf\kibana.yml 中修改主机和端口,浏览器访问 http://localhost:5601 。
2) Ubuntu 20.04 下安装
$ cd ~/apps
$ wget https://artifacts.elastic.co/downloads/kibana/kibana-7.10.2-linux-x86_64.tar.gz
$ tar -zvxf kibana-7.10.2-linux-x86_64.tar.gz # 解压后修改目录为 kibana-7.10.2
$ cd kibana-7.10.2/bin
$ ./kibana
...
在 conf\kibana.yml 中修改主机 IP 和端口,浏览器访问 http://ip:5601 。
3. ElasticSearch 安全设置
ElasticSearch 7.7 以后的版本将安全认证功能已免费开放,并将 X-pack 插件集成了到了开源的 ElasticSearch 版本中,这里以 Windows 版的 ElasticSearch 为例,介绍如何利用 X-pack 给 ElasticSearch 相关组件设置用户名和密码。
1) 设置 X-Pack
进入 C:\Applications\Java\elasticsearch-7.10.2\config 目录,修改 elasticsearch.yml 文件,添加如下内容
http.cors.enabled: true http.cors.allow-origin: "*" http.cors.allow-headers: Authorization xpack.security.enabled: true xpack.security.transport.ssl.enabled: true
2) 添加密码
控制台命令方式进入 C:\Applications\Java\elasticsearch-7.10.2\bin 目录,运行如下命令:
C:\Applications\Java\elasticsearch-7.10.2\bin>elasticsearch-setup-passwords interactive
future versions of Elasticsearch will require Java 11; your Java version from [C:\Program Files\Java\jdk1.8.0_121\jre] does not meet this requirement Initiating the setup of passwords for reserved users elastic,apm_system,kibana,kibana_system,logstash_system,beats_system,remote_monitoring_user. You will be prompted to enter passwords as the process progresses. Please confirm that you would like to continue [y/N]y Enter password for [elastic]: Reenter password for [elastic]: Enter password for [apm_system]: Reenter password for [apm_system]: Enter password for [kibana_system]: Reenter password for [kibana_system]: Enter password for [logstash_system]: Reenter password for [logstash_system]: Enter password for [beats_system]: Reenter password for [beats_system]: Enter password for [remote_monitoring_user]: Reenter password for [remote_monitoring_user]: Changed password for user [apm_system] Changed password for user [kibana_system] Changed password for user [kibana] Changed password for user [logstash_system] Changed password for user [beats_system] Changed password for user [remote_monitoring_user] Changed password for user [elastic]
注:这里把所有账号的密码都设置为 123456,修改密码之后,需要重新设置 kibana 的配置文件(config/kibana.yml), 修改如下:
elasticsearch.username: "kibana_system"
elasticsearch.password: "123456"
以上修改完成后,重启 ElasticSearch 和 Kibana 。
使用 curl 访问需要安全验证的 ElasticSearch,命令如下:
C:\> curl http://localhost:9200
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}}],"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}},"status":401}
带上用户名和密码,命令如下:
C:\> curl --basic -u elastic:123456 http://localhost:9200
{ "name" : "CHSHL01096TKUAN", "cluster_name" : "elasticsearch", "cluster_uuid" : "QagiO3dcTBSVNYl10ibjPQ", "version" : { "number" : "7.10.2", "build_flavor" : "default", "build_type" : "zip", "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9", "build_date" : "2021-01-13T00:42:12.435326Z", "build_snapshot" : false, "lucene_version" : "8.7.0", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
4. ElasticSearch RESTful API
ElasticSearch 提供了多种交互使用方式,包括 Java API 和 RESTful API ,本文主要介绍 RESTful API。使用 RESTful API 通过端口 9200 与 ElasticSearch 进行通信,我们可以使用 Postman 或 curl 命令来与 ElasticSearch 交互。
REST(Representational State Transfer,表述性状态转移)是一组架构约束条件和原则,而满足这些约束条件和原则的应用程序或设计就是 RESTful,其本质就是一种定义接口的规范。RESTful API 的特点:
(1) 基于 HTTP/HTTPS;
(2) 使用 XML 或 JSON 的格式定义;
(3) 每一个 URI 代表一种资源;
(4) 客户端使用 GET、POST、PUT、DELETE 这 4 种表示操作方式的动词对服务端资源进行操作:
GET:获取资源
POST:新建资源(也可以更新资源)
PUT:更新资源
DELETE:删除资源
本文使用的 API 测试环境:ElasticSearch 运行在 Ubuntu20.04 上,在 Linux 下运行 curl 客户端。
1) Index(索引)的操作
(1) 创建一个 Index
语法如下:
PUT /spring
命令行:
$ curl -X PUT "http://localhost:9200/spring" --basic -u elastic:123456
{"acknowledged":true,"shards_acknowledged":true,"index":"spring"}
(2) 查看 Index
语法如下:
GET /spring
命令行:
$ curl -X GET "http://localhost:9200/spring" --basic -u elastic:123456
{"spring":{"aliases":{},"mappings":{},"settings":{"index":{"routing":{"allocation":{"include":{"_tier_preference":"data_content"}}},"number_of_shards":"1","provided_name":"spring","creation_date":"1669727877158","number_of_replicas":"1","uuid":"Rkb56yXLTqOJL4iG90LsMQ","version":{"created":"7100299"}}}}}
(3) 删除 Index
语法如下:
DELETE /spring
命令行:
$ curl -X DELETE "http://localhost:9200/spring" --basic -u elastic:123456
{"acknowledged":true}
# 查看 /spring 索引
$ curl --basic -u elastic:123456 -X GET http://localhost:9200/spring
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [spring]","resource.type":"index_or_alias","resource.id":"spring","index_uuid":"_na_","index":"spring"}],"type":"index_not_found_exception","reason":"no such index [spring]","resource.type":"index_or_alias","resource.id":"spring","index_uuid":"_na_","index":"spring"},"status":404}
(4) 创建 Index 并指定 settings
语法如下:
PUT /spring { "settings": { # 分片数 "number_of_shards": 5, # 备份数 "number_of_replicas": 1 } }
命令行:
$ curl -X PUT "http://localhost:9200/spring" --basic -u elastic:123456 -H "Content-Type:application/json" --data '{"settings":{"number_of_shards":5,"number_of_replicas":1}}'
{"acknowledged":true,"shards_acknowledged":true,"index":"spring"}
注:Windows 下运行 curl 命令时,--data 参数要写成如下格式:
"{"""settings""":{"""number_of_shards""":5,"""number_of_replicas""":1}}"
# 查看 /spring 索引
$ curl -X GET "http://localhost:9200/spring" --basic -u elastic:123456
{"spring":{"aliases":{},"mappings":{},"settings":{"index":{"routing":{"allocation":{"include":{"_tier_preference":"data_content"}}},"number_of_shards":"5","provided_name":"spring","creation_date":"1669729181165","number_of_replicas":"1","uuid":"I-ExJI8mTNS6ZKaEG_EXDw","version":{"created":"7100299"}}}}}
# 查看 /spring 索引的 mapping(创建索引时未指定 mapping,此时为空值)
$ curl -X GET "http://localhost:9200/spring/_mapping" --basic -u elastic:123456
{"spring":{"mappings":{}}}
注:从 Elasticsearch 7 开始默认不支持指定索引类型,默认索引类型是 _doc,如果想改变,则配置include_type_name: true 即可。官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html
2) Document(文档)的操作
(1) 创建一个 Document
创建文档(自动生成 _id),语法如下:
POST /spring/_doc { "name": "Elastic Search", "author": "Elastic Company", "count": 3 }
命令行:
$ curl -X POST "http://localhost:9200/spring/_doc" --basic -u elastic:123456 -H "Content-Type:application/json" --data '{"name":"Elastic Search","author":"Elastic Company","count":3}'
{"_index":"spring","_type":"_doc","_id":"OZhr4YQBeWfeBrcFuThE","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
# 查看 /spring 索引的 mapping (创建文档后,自动创建了 mapping)
$ curl -X GET "http://localhost:9200/spring/_mapping" --basic -u elastic:123456
{"spring":{"mappings":{"properties":{"author":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"count":{"type":"long"},"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}}}
创建文档(手动指定 _id),语法如下:
POST /spring/_doc/2 { "name": "Elastic Search 2", "author": "Elastic Company 2", "count": 8 }
命令行:
$ curl -X POST "http://localhost:9200/spring/_doc/2" --basic -u elastic:123456 -H "Content-Type:application/json" --data '{"name":"Elastic Search 2","author":"Elastic Company 2","count":8}'
{"_index":"spring","_type":"_doc","_id":"2","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
(2) 查看 Document
语法如下:
GET /spring/_doc/OZhr4YQBeWfeBrcFuThE
命令行:
$ curl -X GET "http://localhost:9200/spring/_doc/OZhr4YQBeWfeBrcFuThE" --basic -u elastic:123456
{"_index":"spring","_type":"_doc","_id":"OZhr4YQBeWfeBrcFuThE","_version":1,"_seq_no":0,"_primary_term":1,"found":true,"_source":{"name":"Elastic Search","author":"Elastic Company","count":3}}
(3) 修改 Document
覆盖式修改,语法如下:
PUT /spring/_doc/OZhr4YQBeWfeBrcFuThE { "name": "Elastic Search - overwrite", "author": "Elastic Company - overwrite", "count": 5 }
命令行:
$ curl -X PUT "http://localhost:9200/spring/_doc/OZhr4YQBeWfeBrcFuThE" --basic -u elastic:123456 -H "Content-Type:application/json" --data '{"name":"Elastic Search - overwrite","author":"Elastic Company - overwrite","count":5}'
基于 doc 方式修改,语法如下:
POST /spring/_doc/2/_update { "doc": { "count": 99 } }
命令行:
$ curl -X POST "http://localhost:9200/spring/_doc/2/_update" --basic -u elastic:123456 -H "Content-Type:application/json" --data '{"doc":{"count": 99}}'
{"_index":"spring","_type":"_doc","_id":"2","_version":5,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4,"_primary_term":1}
(4) 删除 Document
语法如下:
DELETE /spring/_doc/OZhr4YQBeWfeBrcFuThE
命令行:
$ curl -X DELETE "http://localhost:9200/spring/_doc/OZhr4YQBeWfeBrcFuThE" --basic -u elastic:123456
{"_index":"spring","_type":"_doc","_id":"OZhr4YQBeWfeBrcFuThE","_version":3,"result":"deleted","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":2,"_primary_term":1}
# 查看被删除的文档
$ curl -X GET "http://localhost:9200/spring/_doc/OZhr4YQBeWfeBrcFuThE" --basic -u elastic:123456
{"_index":"spring","_type":"_doc","_id":"OZhr4YQBeWfeBrcFuThE","found":false}
3) 查询操作
(1) 查询全部文档 match_all
语法如下:
GET /spring/_search { "query": { "match_all": {} } }
命令行:
$ curl -X GET "http://localhost:9200/spring/_search" --basic -u elastic:123456 -H "Content-Type:application/json" --data '{"query":{"match_all":{}}}'
{"took":362,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":3,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"spring","_type":"_doc","_id":"OpiW4YQBeWfeBrcFjTgB","_score":1.0,"_source":{"name":"Elastic Search","author":"Elastic Company","count":3}},{"_index":"spring","_type":"_doc","_id":"2","_score":1.0,"_source":{"name":"Elastic Search 2","author":"Elastic Company 2","count":8}},{"_index":"spring","_type":"_doc","_id":"O5ip4YQBeWfeBrcFZjj5","_score":1.0,"_source":{"name":"Elastic Search - Moidfy","author":"Elastic Company - Modify","count":5}}]}}
(2) 模糊查询 match_phrase_prefix
语法如下:
GET /spring/_search { "query": { "match_phrase_prefix": { "name": "Elastic Search" } } }
命令行:
$ curl -X GET "http://localhost:9200/spring/_search" --basic -u elastic:123456 -H "Content-Type:application/json" --data '{"query":{"match_phrase_prefix":{"name":"Elastic Search"}}}'
{"took":18,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":3,"relation":"eq"},"max_score":0.5753642,"hits":[{"_index":"spring","_type":"_doc","_id":"OpiW4YQBeWfeBrcFjTgB","_score":0.5753642,"_source":{"name":"Elastic Search","author":"Elastic Company","count":3}},{"_index":"spring","_type":"_doc","_id":"2","_score":0.36464313,"_source":{"name":"Elastic Search 2","author":"Elastic Company 2","count":8}},{"_index":"spring","_type":"_doc","_id":"O5ip4YQBeWfeBrcFZjj5","_score":0.36464313,"_source":{"name":"Elastic Search - Moidfy","author":"Elastic Company - Modify","count":5}}]}}
(3) 精准查询 term
语法如下:
GET /spring/_search { "query": { "term": { "name": "Elastic Search" } } }
命令行:
$ curl -X GET "http://localhost:9200/spring/_search" --basic -u elastic:123456 -H "Content-Type:application/json" --data '{"query":{"term":{"name":"Elastic Search"}}}'
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":null,"hits":[]}}
注:精准查询 term 没有查到结果。这是因为创建文档时自动创建的 mapping 中 name 字段格式如下:
"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}
"text" 字段会被自动分词,分词列表里不好含 "Elastic Search"。要得到精准查询结果有两种方法:
方法1:查询 name 字段的子字段 "keyword",格式如下:
"query": { "term": { "name.keyword": "Elastic Search" } }
方法2:创建索引,并指定 mapping 自定义 name 字段,格式如下:
PUT /spring PUT /spring/_mapping { "properties": { "name":{ "type": "keyword", "ignore_above": 256 }, "author":{ "type": "keyword", "ignore_above": 256 }, "count":{ "type": "long" } } }
标签:index,Springboot,elastic,19,spring,Elastic,ElasticSearch,type,name From: https://www.cnblogs.com/tkuang/p/16953660.html