以下练习题来自铭毅天下的《死磕ElasticSearch》知识星球。
Sample 1
某索引index_a有多个字段,要求实现如下的查询:
1)针对字段title,满足'ssas'或者'sasa',至少一个满足
2)针对字段tags(数组字段),如果b字段包含'pingpang',则提升评分。
PUT index_a/_bulk
{"index":{"_id":1}}
{"title":"ssas is very nb", "tags":["pingpang", "basketball"]}
{"index":{"_id":2}}
{"title":"which is sasa","tags":["football"]}
{"index":{"_id":3}}
{"title":"which is ssas","tags":["basktball","football"]}
{"index":{"_id":4}}
{"title":"just for testing", "tags":["pingpang"]}
{"index":{"_id":5}}
{"title":"just for testing", "tags":["basketball"]}
{"index":{"_id":6}}
{"title":"just for testing", "tags":["football"]}
{"index":{"_id":7}}
{"title":"ssas sasa is very good", "tags":["pingpang"]}
解法1:bool query
GET index_a/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "ssas sasa"
}
}
],
"should": [
{
"match": {
"tags": {
"query": "pingpang",
"boost": 2
}
}
}
]
}
}
}
解法2:function_score
GET index_a/_search
{
"query": {
"function_score": {
"query": {
"match": {
"title": "ssas sasa"
}
},
"functions": [
{
"filter": {"match": {"tags": "pingpang"}},
"weight": 5
}
]
}
}
}
Sample 2
有一个文档,内容类似dog & cat, 要求索引这条文档,并且使用match_phrase query,查询dog & cat或者dog and cat都能match。
解法1:使用char_filter
PUT /my-index-000001
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_mappings_char_filter"
]
}
},
"char_filter": {
"my_mappings_char_filter": {
"type": "mapping",
"mappings": [
"& => and"
]
}
}
}
},
"mappings": {
"properties": {
"message": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
解法2: 使用synonym
注意 tokenizer要使用whitespace,不能用standard,因为&会被过滤掉
PUT /my-index-000001
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [
"my_synonym"
]
}
},
"filter": {
"my_synonym": {
"type": "synonym",
"synonyms": [
"& => and"
]
}
}
}
},
"mappings": {
"properties": {
"message": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Sample 3
有index_a包含一些文档, 要求创建索引index_b,通过reindex api将index_a的文档索引到index_b。 要求增加一个整形字段,value是index_a的field_x的字符长度; 再增加一个数组类型的字段,value是field_y的词集合。(field_y是空格分割的一组词,比方"foo bar",索引到index_b后,要求变成["foo", "bar"]。
解法1: 使用ingest script processor
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"source": """
ctx.x_length = ctx.x.length();
String[] ysplit = ctx.y.splitOnToken(" ");
ArrayList ylist = new ArrayList();
for (int i=0; i<ysplit.length; i++){
ylist.add(ysplit[i])
}
ctx.y_list = ylist
"""
}
}
]
},
"docs": [
{
"_source": {
"x": "hello",
"y": "foo bar"
}
}
]
}
解法2:使用 ingest script + split processor
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"source": """
ctx.x_length = ctx.x.length();
"""
}
},
{
"split": {
"field": "y",
"separator": " ",
"target_field": "y_list"
}
}
]
},
"docs": [
{
"_source": {
"x": "hello",
"y": "foo bar zee"
}
}
]
}
Sample 4
执行Reindex,实现以下两个功能:
- 把 source index 的某个字段(该字段是数组)里的子项都去掉前后空格
- 增加一个新字段,这个新字段的值是 source index 的其中两个字段的拼接
解法:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"foreach": {
"field": "x",
"processor": {
"trim": {
"field": "_ingest._value"
}
}
},
"script": {
"source": "ctx.yz = ctx.y + ' ' +ctx.z"
}
}
]
},
"docs": [
{
"_source": {
"x": ["foo ", " bar"],
"y": "hello",
"z": "world"
}
}
]
}
Sample 5
对三个字段 a/b/c 查询 xxx, 要求 c 字段 boost 2, 各字段查询算分加和
解法1:multi_match
GET index_a/_search
{
"query": {
"multi_match": {
"type": "most_fields",
"query": "ssas",
"fields": ["title^2", "tags"]
}
}
}
解法2:bool query should
GET index_a/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "ssas",
"boost": 2
}
}
},
{
"match": {
"tags": "ssas"
}
}
]
}
}
}
Sample 6
定义一个 Pipeline,并且将 eathquakes 索引的文档进行更新
- pipeline的 ID 为 eathquakes_pipeline
- 将 magnitude_type 的字段值改为大写
- 如果文档不包含 “batch_number”, 增加这个字段,将数值设置为 1
- 如果已经包含 batch_number, 字段值加1
解法:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"uppercase": {
"field": "magnitude_type"
}
},
{
"script": {
"source": """
if(ctx.batch_number == null){
ctx.batch_number = 1;
}else{
ctx.batch_number++;
}
"""
}
}
]
},
"docs": [
{
"_source": {
"magnitude_type": "foo"
}
},
{
"_source": {
"magnitude_type": "bar",
"batch_number": 2
}
}
]
}
Sample 7
earthquakes索引中包含了过去11个月的地震信息,请通过一句查询,获取以下信息
- 过去11个月,每个月的平均地震等级(magnitude)
- 过去11个月里,平均地震等级最高的一个月及其平均地震等级
- 搜索不能返回任何文档
解法:
GET earthquakes/_search
{
"size": 0,
"aggs": {
"monthly_aggs": {
"date_histogram": {
"field": "time",
"calendar_interval": "month"
},
"aggs": {
"avg_magnitude": {
"avg": {
"field": "magnitude"
}
}
}
},
"max_avg_monthly_magnitude": {
"max_bucket": {
"buckets_path": "monthly_aggs>avg_magnitude"
}
}
}
}
POST earthquakes/_bulk
{"index":{"_id":1}}
{"time":"2019-01-01T17:00:00", "magnitude":1}
{"index":{"_id":2}}
{"time":"2019-01-01T20:00:00", "magnitude":3}
{"index":{"_id":3}}
{"time":"2019-02-01T17:00:00", "magnitude":4}
{"index":{"_id":3}}
{"time":"2019-02-20T17:00:00", "magnitude":5}
{"index":{"_id":4}}
{"time":"2019-11-01T17:00:00", "magnitude":7}
{"index":{"_id":5}}
{"time":"2019-11-01T17:00:00", "magnitude":8}
{"index":{"_id":6}}
{"time":"2019-11-01T17:00:00", "magnitude":9}
PUT earthquakes
{
"mappings": {
"properties": {
"time": {
"type": "date"
},
"magnitude": {
"type": "integer"
}
}
}
}
DELETE earthquakes
Sample 8
安装并配置 一个 hot & warm 架构的集群:
- 三个节点, node 1 为 hot , node2 为 warm,node 3 为cold
- 三个节点均为 master-eligable 节点
- 新创建的索引,数据写入 hot 节点
- 通过一条命令,将数据从 hot 节点移动到 warm 节点
解法:
先配置node attr,编辑elasticsearch.yml,添加如下nodeattr
node.attr.hot_warm_type: hot
node.attr.hot_warm_type: warm
DELETE hotwarm_index
PUT hotwarm_index
{
"settings": {
"index.routing.allocation.include.hot_warm_type": "hot",
"number_of_replicas": 0,
"number_of_shards": 1
}
}
PUT hotwarm_index/_bulk
{"index":{"_id":1}}
{"name":"foo"}
{"index":{"_id":2}}
{"name":"bar"}
GET _cat/shards?v
PUT hotwarm_index/_settings
{
"index.routing.allocation.include.hot_warm_type": "warm"
}
GET _cat/shards?v
Sample 9
ilm + datastream, 数据首先分布在data_hot,2分钟之后rollover,再过5分钟之后,迁移到data_warm,再过3分钟,迁移到data_cold,再过6分钟删除
解法:
DELETE _data_stream/my-datastream
GET .ds-my-datastream-2022.02.26-000001/_ilm/explain # 查看ilm状态
GET _cat/shards/.ds-my-datastream-2022.02.26-000001?v #查看该index的shard分布
GET my-datastream
GET _data_stream/my-datastream
# 一定要POST或者是PUT + op_type为Create,即是要新建doc
POST my-datastream/_doc
{
"message": "a",
"@timestamp": "2099-05-06T16:21:15.000Z"
}
# 要设置上`data_stream: {}`,这样才会自动创建出来data_stream
PUT _index_template/my-datastream-template
{
"index_patterns": [
"my-datastream*"
],
"data_stream": {},
"template": {
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1,
"index.lifecycle.name": "test_policy"
}
}
}
PUT _ilm/policy/test_policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0m",
"actions": {
"rollover": {
"max_age": "2m"
}
}
},
"warm": {
"min_age": "5m",
"actions": {}
},
"cold": {
"min_age": "8m",
"actions": {}
},
"delete": {
"min_age": "14m",
"actions": {
"delete": {}
}
}
}
}
}
PUT _cluster/settings
{
"persistent": {
"indices.lifecycle.poll_interval": "3s"
}
}
关于ILM,有几个地方需要注意:
- 被ILM管理的索引都有一个age时间,表示存活时间,而每个phase都有一个min_age,指的就是当这个索引的存活时间达到min_age时,即进入对应的phase,所以每个phase的min_age是不断增加的,不能后一个phase比前一个小,但是如果有rollover action的话,age会在rollover之后重置,即hot phase之后的min_age,都是从rollover之后开始的。
- 只有当前phase中的actions执行完成之后,才会进入下一个phase,而如果在前一个phase耽误了较长时间,导致超过了下一个phase的min_age,则会很快跳过下一个phase,进入到下下一个phase
- min_age为0表示立即进入该phase,所以hot phase都将min_age设置为0,而只有当hot phase中的action执行完成之后,才会执行下一个phase
- ILM有个cluster配置:indices.lifecycle.poll_interval,即检查phase切换的间隔,默认是10分钟,因此如果设置的min_age太小的话,不会按照预期的进行切换,因此需要对应的将该poll_interval调小
- data stream, ilm之间的关系,他们彼此不依赖,都可以独立使用,ilm主要是来自动化管理索引的,包含data stream和一般的索引,而data stream没有ilm的话,就需要手动进行管理,所以data stream+ilm搭配起来使用是最合适的。
- ILM如果是管理的index + alias的话,并且需要rollover的话,一定要配合index template一起使用才有意义,否则rollover之后自动建出来的index,不会被ILM管理,如果不需要rollover,则不需要intex template,只是对单个索引进行管理。
- 如果使用hot-warm架构的话,并且使用es内置的data tier去调度,则想让调度生效的话,需要在node.roles中去掉
data
这个role,要不然设置了 "index.routing.allocation.include._tier_preference": "data_hot" 不生效。官方解释:A node can belong to multiple tiers, but a node that has one of the specialized data roles cannot have the generic data role.
- 由于primariy shard和replica shard不能在同一个节点上,所以当某一个role的节点只有一个时,需要将replica设为0
若是ILM不是通过data stream来管理index,则会稍微复杂一些,以下为示例:
PUT my-policy-index-000001
{
"aliases": {
"test_alias": {
"is_write_index": true
}
}
}
PUT _index_template/my-policy-index_template
{
"index_patterns": ["my-policy-index-*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"index.lifecycle.name": "test_policy",
"index.lifecycle.rollover_alias": "test_alias",
"index.routing.allocation.include._tier_preference": "data_hot"
}
}
}
PUT _ilm/policy/test_policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0m",
"actions": {
"rollover": {
"max_age": "2m"
}
}
},
"warm": {
"min_age": "5m",
"actions": {}
},
"cold": {
"min_age": "8m",
"actions": {}
},
"delete": {
"min_age": "14m",
"actions": {
"delete": {}
}
}
}
}
}
Sample 10
有一个索引task2,有field2字段 用match匹配the能查到很多数据,现要求对task2索引进行重建,重建后的索引叫new_task2 然后match匹配the查不到数据
解法1: 使用stop analyzer
PUT test1/_doc/1
{
"message": "you are the best"
}
PUT test2
{
"mappings" : {
"properties" : {
"message" : {
"type" : "text",
"analyzer": "stop",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
POST _reindex
{
"source": {
"index": "test1"
},
"dest": {
"index": "test2"
}
}
解法2: 使用stop filter
PUT test1/_doc/1
{
"message": "you are the best"
}
PUT test3
{
"settings": {
"analysis": {
"analyzer": {
"stop_analyzer": {
"tokenizer": "standard",
"filter": ["stop"]
}
}
}
},
"mappings" : {
"properties" : {
"message" : {
"type" : "text",
"analyzer": "stop_analyzer",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
POST _reindex
{
"source": {
"index": "test1"
},
"dest": {
"index": "test2"
}
}
Sample 11
在test索引里创建一个runtime字段,它的值为字段A减去字段B,创建一个range聚合,分为三个级别:
- 小于0
- 0到100
- 100以上
- 返回文档数为0
解法:
POST test4/_bulk
{"index": {}}
{"A": 100, "B": 200}
{"index": {}}
{"A": 10, "B": 20}
{"index": {}}
{"A": 200, "B": 20}
{"index": {}}
{"A": 100, "B": 20}
{"index": {}}
{"A": 100, "B": 50}
GET test4/_search
{
"size": 0,
"runtime_mappings": {
"C": {
"type": "long",
"script": {
"source": "emit(doc['A'].value-doc['B'].value)"
}
}
},
"aggs": {
"caggs": {
"range": {
"field": "C",
"ranges": [
{
"to": 0
},
{
"from": 0,
"to": 100
},
{
"from": 100
}
]
}
}
}
}
Sample 12
testa 和 testb 两索引, 有一个关联字段x, 建立新的索引有testa索引的全部数据, 并且通过x的关联也包含了testb索引对应数据
解法:
PUT testb/_bulk
{"index":{}}
{"b":10,"x":2}
{"index":{}}
{"b":5,"x":5}
PUT testa/_bulk
{"index":{}}
{"a":1,"x":2}
{"index":{}}
{"a":3,"x":2}
{"index":{}}
{"a":5,"x":4}
PUT /_enrich/policy/myenrich-policy
{
"match": {
"indices": "testb",
"match_field": "x",
"enrich_fields": ["x", "b"]
}
}
POST /_enrich/policy/myenrich-policy/_execute
PUT _ingest/pipeline/mypipeline
{
"processors": [
{
"enrich": {
"policy_name": "myenrich-policy",
"field": "x",
"target_field": "c"
}
}
]
}
POST _reindex
{
"source": {
"index": "testa"
},
"dest": {
"index": "testc",
"pipeline": "mypipeline"
}
}
Sample 13
对集群一上的task9索引编写一个查询,并满足以下要求:
- 'a','b','c'字段至少有两个字段匹配中'test'关键字
- 对查询结果进行排序,先按照'a'字段进行降序排序,再按照'_socre'进行升序排序
- 'a'字段的返回结果高亮显示,前标签是"<h1>",后标签是"</h1>"
解法:
GET test6/_search
{
"query": {
"bool": {
"should": [
{"match": {"a": "test"}},
{"match": {"b": "test"}},
{"match": {"c": "test"}}
],
"minimum_should_match": 2
}
},
"highlight": {
"fields": {
"a": {}
},
"pre_tags": ["<h1>"],
"post_tags": ["</h1>"]
},
"sort": [
{
"a.keyword": {
"order": "desc"
}
},
{
"_score": {
"order": "asc"
}
}
]
}
PUT test6/_bulk
{"index": {}}
{"a": "test", "b": "foo", "c": "bar"}
{"index": {}}
{"a": "test", "b": "test", "c": "bar"}
{"index": {}}
{"a": "test", "b": "foo", "c": "test"}
Sample 14
解决集群变红或者是变黄的问题
解法:
GET _cluster/health
GET _cluster/health?level=indices
GET _cluster/health/my-index-000001?level=shards
GET /_cat/shards/my-index-000001?v
GET _cat/indices?health=yellow&v
GET _cluster/allocation/explain
Sample 15
创建一个搜索模板,name为task10,搜索模板满足以下条件:
- 对于字段a,搜索param为search_string
- 使用start_date和end_date参数范围查询timestamp字段,如果没有提供end_date字段,那么结束时间默认是现在
- 对于返回值,要高亮a字段的内容,用<strong>和</strong>框起来
- 返回结果先按照b字段排序,然后再按照score排序
写一个搜索语句,对movie索引进行搜索,使用搜索模板为task10,search_string的值为star
解法:
GET task5/_search/template
{
"id": "task5_template",
"params": {
"search_string": "foo",
"start_date": "2022-01-01"
}
}
PUT task5/_bulk
{"index": {}}
{"a": "foo", "b": 10, "timestamp": "2022-01-01"}
{"index": {}}
{"a": "foo", "b": 4, "timestamp": "2022-02-01"}
{"index": {}}
{"a": "foo bar", "b": 34, "timestamp": "2022-03-01"}
{"index": {}}
{"a": "bar", "b": 2, "timestamp": "2021-01-01"}
PUT _scripts/task5_template
{
"script": {
"lang": "mustache",
"source": """
{
"query": {
"bool": {
"filter": [
{"match": {"a": "{{search_string}}"}},
{"range": {
"timestamp": {
"gte": "{{start_date}}",
"lte": "{{end_date}}{{^end_date}}now/d{{/end_date}}"
}
}}
]
}
},
"highlight": {
"fields": {
"a": {}
},
"pre_tags": ["<strong>"],
"post_tags": ["</strong>"]
},
"sort": [
{
"b": {
"order": "desc"
}
},
{
"_score": {
"order": "asc"
}
}
]
}
"""
}
}
注意,因为在source里要写很长的语句,并且kibana没有提示,直接写的话,很容易出错,所以可以先在_search中将语句写出来,然后复制到source字段,但是注意复制的时候,复制的之后,一定要将query语句外面包括的{}给复制过来,即是这种形式:"source": """ {"query": {}} """,而不是 "source": """ "query": {} """
Sample 16
对a字段进行term匹配,对b字段进行match匹配,对c字段进行加权算分,c字段是由另外两个字段得来的
解法:
GET task6/_search
{
"runtime_mappings": {
"z": {
"type": "long",
"script": {
"source": "emit(doc['x'].value + doc['y'].value)"
}
}
},
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{"match": {"b": "hello"}},
{"term": {"a": "foo"}}
]
}
},
"script_score": {
"script": {
"source": "_score * doc['z'].value"
}
}
}
}
}
PUT task6/_bulk
{"index": {}}
{"x": 2, "y": 4, "a": "foo", "b": "hello world"}
{"index": {}}
{"x": 100, "y": 50, "a": "bar", "b": "hello world 1"}
{"index": {}}
{"x": 200, "y": 10, "a": "foo", "b": "hello"}
需要注意function_score的作用,是在一个query的基础上,去影响这个query的评分。
标签:index,Elastic,GET,type,source,Practices,PUT,my,Certified From: https://www.cnblogs.com/hackerain/p/17206834.html