1. 创建索引
在Elasticsearch中,创建索引的基本语法格式为:
PUT /索引名称
{
"settings": {
// 索引的设置,如分片数量、副本数量、分词器等
},
"mappings": {
"properties": {
"字段名称": {
"type": "字段类型", // 例如 text、keyword、date 等
"analyzer": "分词器名称" // 指定字段使用的分词器(可选)
}
}
}
}
1.1 使用默认分词器
创建一个名为default_index
的索引,字段content
使用默认分词器。
PUT /default_index
{
"mappings": {
"properties": {
"content": {
"type": "text" // 使用默认的标准分词器处理文本
}
}
}
}
运行结果
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "default_index"
}解释:
- acknowledged:布尔值,表示请求已被Elasticsearch集群接受并成功处理。在创建索引时,这意味着索引的创建请求已被主节点确认。
- shards_acknowledged:布尔值,表示所有相关分片(主分片和副本分片)的创建已被集群中的节点确认。对于分片的确认,确保了数据在集群中的分布是安全和稳定的。
- index:字符串,表示被创建或操作的索引名称。在这里,
default_index
是被创建的索引名称。
查看索引
GET /default_index
运行结果
{
"default_index": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "1",
"provided_name": "default_index",
"creation_date": "1721109898057",
"number_of_replicas": "1",
"uuid": "xk4t_iKjQq65Dgx4DNe_yg",
"version": {
"created": "8503000"
}
}
}
}
}解释:
- default_index:这是索引的名称。在Elasticsearch中,索引相当于一个数据库的概念。
- aliases:这是索引的别名信息,目前为空,表示没有别名。
- mappings:这是索引的映射定义,目前为空,表示没有定义具体的字段结构。
- settings:这是索引的配置设置,包含了一些关键参数。
- index:与索引相关的所有设置。
- created:索引创建时的Elasticsearch版本号,这里表示8.5.0版本。
- version:与索引版本相关的信息。
- uuid:索引的唯一标识符,这是由Elasticsearch自动生成的。
- number_of_replicas:副本分片的数量,设置为1。副本分片用于提高数据的冗余性和查询性能。
- creation_date:索引的创建时间,使用毫秒级时间戳表示。
- provided_name:索引的名称,设置为
default_index
。- number_of_shards:索引的主分片数量,设置为1。分片是Elasticsearch用于水平分割数据的基本单位。
- routing:与路由相关的配置。
- allocation:与分配策略相关的配置。
- include:指定包含的条件。
- _tier_preference:表示数据的层级偏好,这里设置为
data_content
,用于优化数据存储位置。
1.2 使用IK分词器
创建一个名为ik_index
的索引,字段content
使用IK分词器。
PUT /ik_index
{
"settings": { // 设置部分
"analysis": { // 分析部分
"analyzer": { // 分析器部分
"ik_analyzer": { // 自定义的ik_analyzer分析器
"type": "custom", // 指定这是一个自定义分析器
"tokenizer": "ik_max_word" // 使用IK的最大词分词器,ik_max_word
}
}
}
},
"mappings": { // 映射部分
"properties": { // 属性部分
"content": { // 定义content字段
"type": "text", // content字段的类型为text
"analyzer": "ik_analyzer" // 指定content字段使用ik_analyzer分析器
}
}
}
}
查看索引
GET /ik_index
1.3 使用自定义分词器
自定义分词器允许用户组合不同的分词组件,如字符过滤器、分词器和标记过滤器。
PUT /custom_index
{
"settings": { // 设置部分
"analysis": { // 分析部分
"tokenizer": { // 分词器部分
"custom_tokenizer": { // 自定义分词器
"type": "standard", // 分词器类型为standard
"max_token_length": 5 // 设置最大标记长度
}
},
"filter": { // 过滤器部分
"custom_stop": { // 自定义停止词过滤器
"type": "stop", // 过滤器类型为stop
"stopwords": ["the", "a", "an"] // 自定义停止词
},
"custom_synonym": { // 自定义同义词过滤器
"type": "synonym", // 过滤器类型为synonym
"synonyms": [ // 定义同义词
"quick,fast",
"jumps,leaps"
]
}
},
"analyzer": { // 分析器部分
"custom_analyzer": { // 自定义分析器
"type": "custom", // 自定义类型
"tokenizer": "custom_tokenizer", // 使用自定义分词器
"filter": [ // 使用的过滤器列表
"lowercase", // 内置小写过滤器
"custom_stop", // 自定义停止词过滤器
"custom_synonym" // 自定义同义词过滤器
]
}
}
}
},
"mappings": { // 映射部分
"properties": { // 属性部分
"content": { // 定义content字段
"type": "text", // content字段的类型为text
"analyzer": "custom_analyzer" // 指定content字段使用custom_analyzer分析器
}
}
}
}
2. 文档处理
2.1 文档添加
2.1.1 POST(自动生成ID)
POST /default_index/_doc
{
"content": "This is a sample document."
}
运行结果
{
"_index": "default_index",
"_id": "sQerupABhxJU_CY1yjsl",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}解释:
_index
: 这指示操作发生在名为default_index
的索引上。
_id
: 这是文档的唯一标识符。。
_version
: 文档的版本号
result
: 这表示操作的结果。c
_shards
:
total
: 表示操作涉及的分片总数。successful
: 表示操作在1个分片上成功执行。failed
: 表示操作失败的分片数。
_seq_no
: 这是一个序列号,它是Elasticsearch用于内部管理文档版本冲突的一种机制。序列号从0开始,每次文档更新时递增。
_primary_term
: 主分片的当前任期号。每当主分片发生变化时(例如,在节点故障后的重新分配),这个数字会递增。
#通过id查询文档内容
GET /default_index/_doc/sQerupABhxJU_CY1yjsl
运行结果
{
"_index": "default_index",
"_id": "sQerupABhxJU_CY1yjsl",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"content": "This is a sample document."
}
}解释:
_index
: 指示文档存储索引。
_id
: 文档的唯一标识符。
_version
: 文档的版本号。
_seq_no
(Sequence Number): 文档的序列号。
_primary_term
: 表示文档所在的主分片的当前任期号。
found
: 这个字段表明请求的文档是否被找到。
_source
:
content
: 这是文档的实际内容。在_source
字段中,你可以看到文档存储在Elasticsearch中的原始数据。这里的content
字段包含文档的文本内容。
2.1.2 PUT添加或更新单个文档(指定ID)
PUT /default_index/_doc/1
{
"content": "This is another sample document with a specific ID."
}
运行结果
{
"_index": "default_index",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1
}
再PUT一次,运行结果。
{
"_index": "default_index",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 2,
"_primary_term": 1
}
2.1.3 使用批量API添加多个文档
POST /_bulk
{ "index" : { "_index" : "default_index", "_id" : "2" } }
{ "content" : "Here is a bulk document 2" }
{ "index" : { "_index" : "default_index", "_id" : "3" } }
{ "content" : "Here is a bulk document 3" }
运行结果
{
"errors": false,
"took": 11,
"items": [
{
"index": {
"_index": "default_index",
"_id": "2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 3,
"_primary_term": 1,
"status": 201
}
},
{
"index": {
"_index": "default_index",
"_id": "3",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 4,
"_primary_term": 1,
"status": 201
}
}
]
}
2.2 更新文档
2.2.1 更新文档
PUT /default_index/_doc/1
{
"content": "This is another sample document with a specific ID."
}
运行结果
{
"_index": "default_index",
"_id": "1",
"_version": 3,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 5,
"_primary_term": 1
}
2.2.2 部分更新
如果你只想更新文档的一部分而不是替换整个文档,可以使用_update
。
POST /default_index/_update/1
{
"doc": {
"title": "New data",
"content": "Updated content"
}
}
运行结果
{
"_index": "default_index",
"_id": "1",
"_version": 4,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 6,
"_primary_term": 1
}
GET /default_index/_doc/1
运行结果
{
"_index": "default_index",
"_id": "1",
"_version": 4,
"_seq_no": 6,
"_primary_term": 1,
"found": true,
"_source": {
"content": "Updated content",
"title": "New data"
}
}解释:
这个操作会将现有文档ID为
1
的content
字段更新为"Updated content",并添加一个名为title的新字段。
2.2.3 批量更新
POST /_bulk
{ "update": { "_index": "default_index", "_id": "1" } }
{ "doc": { "content": "Updated content for document 1" } }
{ "update": { "_index": "default_index", "_id": "2" } }
{ "doc": { "content": "Updated content for document 2" } }
运行结果
{
"errors": false,
"took": 18,
"items": [
{
"update": {
"_index": "default_index",
"_id": "1",
"_version": 5,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 7,
"_primary_term": 1,
"status": 200
}
},
{
"update": {
"_index": "default_index",
"_id": "2",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 8,
"_primary_term": 1,
"status": 200
}
}
]
}
2.3 删除文档
2.3.1 删除单个文档
DELETE /default_index/_doc/1
运行结果
{
"_index": "default_index",
"_id": "1",
"_version": 6,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 9,
"_primary_term": 1
}
2.3.2 删除多个文档
如果你需要根据查询条件批量删除多个文档,可以使用_delete_by_query
。
POST /default_index/_delete_by_query
{
"query": {
"match": {
"content": "is"
}
}
}
运行结果
{
"took": 80,
"timed_out": false,
"total": 2,
"deleted": 2,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": []
}
2.4 查询文档
2.4.1 Match查询
它会在指定字段中搜索匹配的词条,并考虑词条的同义词和变体。
GET /default_index/_search
{
"query": {
"match": {
"content": "bulk"
}
}
}
运行结果
{
"took": 71,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.49917626,
"hits": [
{
"_index": "default_index",
"_id": "2",
"_score": 0.49917626,
"_source": {
"content": "Here is a bulk document 2"
}
},
{
"_index": "default_index",
"_id": "3",
"_score": 0.49917626,
"_source": {
"content": "Here is a bulk document 3"
}
}
]
}
}
2.4.2 Term查询
查询用于精确值匹配,keyword
类型的字段不会被分词处理,整个字符串会被视为一个单一的词元。这意味着字段的内容将被完整地存储和索引,而不是像 text
字段那样被拆分为多个词元。
PUT /key_word_index
{
"mappings":
{
"properties":
{
"key_word":
{
"type": "keyword"
}
}
}
}
POST /key_word_index/_doc
{
"key_word": "This is a document."
}
GET /key_word_index/_search
{
"query": {
"term": {
"key_word": "This is a document."
}
}
}
运行结果
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "key_word_index",
"_id": "swfaupABhxJU_CY1DDuu",
"_score": 0.2876821,
"_source": {
"key_word": "This is a document."
}
}
]
}
}
2.4.3 Bool查询
Bool
查询是 Elasticsearch 中非常强大的查询类型之一,它允许你组合多个查询条件作为 must
(必须)、should
(应该)、must_not
(必须不)、和 filter
(过滤)的子句。
POST /simple_index/_doc/1
{
"title": "Learn Elasticsearch",
"published": true
}
POST /simple_index/_doc/2
{
"title": "Advanced Elasticsearch",
"published": false
}
POST /simple_index/_doc/3
{
"title": "Elasticsearch Tips",
"published": true
}
GET /simple_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "Elasticsearch"
}
}
],
"filter": [
{
"term": {
"published": true
}
}
]
}
}
}
运行结果
{
"took": 34,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.13353139,
"hits": [
{
"_index": "simple_index",
"_id": "1",
"_score": 0.13353139,
"_source": {
"title": "Learn Elasticsearch",
"published": true
}
},
{
"_index": "simple_index",
"_id": "3",
"_score": 0.13353139,
"_source": {
"title": "Elasticsearch Tips",
"published": true
}
}
]
}
}
must
条件确保返回的文档必须匹配标题中的 Elasticserach。filter
条件用于过滤出那些published
字段为true
的文档。
准备后面几个查询需要的数据
DELETE /simple_index
PUT /simple_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"ik_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word"
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"description": {
"type": "text",
"analyzer": "ik_analyzer"
},
"date": {
"type": "date"
},
"published": {
"type": "boolean"
}
}
}
}
POST /simple_index/_doc/1
{
"title": "Elasticsearch入门",
"description": "学习Elasticsearch的基础知识",
"date": "2021-06-15",
"published": true
}
POST /simple_index/_doc/2
{
"title": "Elasticsearch进阶",
"description": "深入了解Elasticsearch的高级功能",
"date": "2021-07-20",
"published": false
}
POST /simple_index/_doc/3
{
"title": "Elasticsearch技巧",
"description": "使用Elasticsearch的实用技巧",
"date": "2021-12-01",
"published": true
}
2.4.4 Range查询
GET /simple_index/_search
{
"query": {
"range": {
"date": {
"gte": "2021-01-01",
"lte": "2021-12-31"
}
}
}
}
2.4.5 Wildcard查询
GET /simple_index/_search
{
"query": {
"wildcard": {
"title.keyword": {
"value": "Elastic*"
}
}
}
}
2.4.6 Multi-Match查询
GET /simple_index/_search
{
"query": {
"multi_match": {
"query": "Elasticsearch",
"fields": ["title", "description"]
}
}
}
2.4.7 Fuzzy 查询
#需要使用IK分词器如果是中文
GET /simple_index/_search
{
"query": {
"fuzzy": {
"title": {
"value": "进段"
}
}
}
}
2.4.8 聚和查询
GET /simple_index/_search
{
"size": 0, // 不返回任何文档,只返回聚合结果
"aggs": {
"published_stats": {
"terms": {
"field": "published"
}
}
}
}
运行结果
{
"took": 225,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"published_stats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"key_as_string": "true",
"doc_count": 2
},
{
"key": 0,
"key_as_string": "false",
"doc_count": 1
}
]
}
}
}解释:
true
桶包含 2个文档,这表明有 2文档的published
字段为true
;而false
桶包含 1 个文档,表明有 1个文档的published
字段为false
3. 映射操作
3.1 创建索引和映射
PUT /blog_posts
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"content": {
"type": "text"
},
"published_date": {
"type": "date"
}
}
}
}
- title 和 content 字段被设置为
text
类型,这是最适合全文搜索的数据类型。默认情况下,text
字段会被分词,使其能够进行全文搜索。 - published_date 字段被设置为
date
类型,这使得我们能够对日期进行查询,比如找出在某个日期之后发布的所有博客文章。
3.2 添加新字段到映射
PUT /blog_posts/_mapping
{
"properties": {
"views": {
"type": "integer"
}
}
}
3.3 查看映射
GET /blog_posts/_mapping
3.4创建模板索引
PUT /_index_template/blog_template
{
"index_patterns": ["blog-*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"content": {
"type": "text"
},
"date": {
"type": "date"
},
"author": {
"type": "keyword"
},
"tags": {
"type": "keyword"
}
}
}
}
}
PUT /blog-p/_doc/1
{
"author": "张三"
}
GET /blog-p/_mapping
该模板适用于名称以 blog-开头的所有新索引,
- index_patterns: 这一数组定义了哪些索引名称会触发这个模板的应用。
- template: 这里定义了每个匹配的索引将自动应用的设置和映射。
- settings: 包括分片和副本的数量。
- mappings: 定义索引中字段的类型,例如
text
、date
、keyword
等。
4. 其他
4.1 过滤
PUT /chinese_blog
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"ik_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word"
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_analyzer"
},
"content": {
"type": "text",
"analyzer": "ik_analyzer"
},
"author": {
"type": "keyword"
},
"published_date": {
"type": "date"
}
}
}
}
POST /chinese_blog/_doc
{
"title": "如何学习 Elasticsearch",
"content": "Elasticsearch 是一个高度可扩展的开源全文搜索和分析引擎...",
"author": "张三",
"published_date": "2021-01-01"
}
POST /chinese_blog/_doc
{
"title": "理解 JSON 数据格式",
"content": "JSON (JavaScript 对象表示法) 是一种轻量级的数据交换格式...",
"author": "李四",
"published_date": "2021-02-15"
}
GET /chinese_blog/_search
{
"_source": ["title", "published_date"],
"query": {
"match": {
"title": "Elasticsearch"
}
}
}
这个查询将返回所有标题中包含“Elasticsearch”的文章,但只显示文章的标题和发布日期。
4.2 分页
PUT /blog_postss
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"content": {
"type": "text"
},
"published_date": {
"type": "date",
"format": "yyyy-MM-dd" // 确定日期格式
}
}
}
}
POST /blog_postss/_bulk
{ "index": {} }
{ "title": "Introduction to Elasticsearch", "content": "Learn how to use Elasticsearch, from beginner basics to advanced techniques.", "published_date": "2021-01-01" }
{ "index": {} }
{ "title": "Advanced Elasticsearch", "content": "Dive deep into Elasticsearch's advanced features and workings.", "published_date": "2021-01-02" }
{ "index": {} }
{ "title": "Elasticsearch Mapping", "content": "Understanding mappings and how they work in Elasticsearch.", "published_date": "2021-01-03" }
{ "index": {} }
{ "title": "Data Ingestion with Elasticsearch", "content": "Best practices for ingesting data into your Elasticsearch cluster.", "published_date": "2021-01-04" }
{ "index": {} }
{ "title": "Text Analysis and Elasticsearch", "content": "Exploring text analysis capabilities in Elasticsearch.", "published_date": "2021-01-05" }
{ "index": {} }
{ "title": "Elasticsearch Query DSL", "content": "Learn how to use the powerful Query DSL to get the most out of Elasticsearch.", "published_date": "2021-01-06" }
{ "index": {} }
{ "title": "Monitoring Elasticsearch", "content": "Techniques for monitoring and maintaining an Elasticsearch cluster.", "published_date": "2021-01-07" }
{ "index": {} }
{ "title": "Security in Elasticsearch", "content": "Implementing security and access control in Elasticsearch.", "published_date": "2021-01-08" }
{ "index": {} }
{ "title": "Scalability in Elasticsearch", "content": "Strategies to scale your Elasticsearch cluster.", "published_date": "2021-01-09" }
{ "index": {} }
{ "title": "Elasticsearch and Kibana", "content": "Utilizing Kibana for visualizing data in Elasticsearch.", "published_date": "2021-01-10" }
#1-3页
GET /blog_postss/_search
{
"from": 0,
"size": 3,
"query": {
"match_all": {}
}
}
#4-6页
GET /blog_postss/_search
{
"from": 3,
"size": 3,
"query": {
"match_all": {}
}
}
4.3 高亮
4.3.1 查询并添加高亮显示
#在分页的基础上
GET /blog_postss/_search
{
"query": {
"match": {
"content": "Elasticsearch"
}
},
"highlight": {
"fields": {
"content": {} // 默认的高亮配置
}
}
}
4.3.2 自定义高亮显示
GET /blog_postss/_search
{
"query": {
"match": {
"content": "Elasticsearch"
}
},
"highlight": {
"fields": {
"content": {
"pre_tags": ["<strong>"],
"post_tags": ["</strong>"]
}
}
}
}
4.4.3 分页和高亮显示
GET /blog_postss/_search
{
"from": 0,
"size": 3,
"query": {
"match": {
"content": "Elasticsearch"
}
},
"highlight": {
"fields": {
"content": {
"pre_tags": ["<em>"],
"post_tags": ["</em>"]
}
}
}
}
5. 总结
用例的代码在elastic的Kibana-8.13.4的控制台测试运行,仅供学习交流使用,如果运行不了请自行排查Bug。
标签:index,Elastic,title,default,Kibana,content,Elasticsearch,published,8.13 From: https://blog.csdn.net/qq_71387716/article/details/140464774