首页 > 其他分享 >Elastic Certified Engineer Practices

Elastic Certified Engineer Practices

时间:2023-03-11 20:11:24浏览次数:56  
标签:index Elastic GET type source Practices PUT my Certified

以下练习题来自铭毅天下的《死磕ElasticSearch》知识星球。

Sample 1

某索引index_a有多个字段,要求实现如下的查询:
1)针对字段title,满足'ssas'或者'sasa',至少一个满足
2)针对字段tags(数组字段),如果b字段包含'pingpang',则提升评分。

PUT index_a/_bulk
{"index":{"_id":1}}
{"title":"ssas is very nb", "tags":["pingpang", "basketball"]}
{"index":{"_id":2}}
{"title":"which is sasa","tags":["football"]}
{"index":{"_id":3}}
{"title":"which is ssas","tags":["basktball","football"]}
{"index":{"_id":4}}
{"title":"just for testing", "tags":["pingpang"]}
{"index":{"_id":5}}
{"title":"just for testing", "tags":["basketball"]}
{"index":{"_id":6}}
{"title":"just for testing", "tags":["football"]}
{"index":{"_id":7}}
{"title":"ssas sasa is very good", "tags":["pingpang"]}

解法1:bool query

GET index_a/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "ssas sasa"
          }
        }
      ],
      "should": [
        {
          "match": {
            "tags": {
              "query": "pingpang",
              "boost": 2
            }
          }
        }
      ]
    }
  }
}

解法2:function_score

GET index_a/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "title": "ssas sasa"
        }
      },
      "functions": [
        {
          "filter": {"match": {"tags": "pingpang"}},
          "weight": 5
        }
      ]
    }
  }
}

Sample 2

有一个文档,内容类似dog & cat, 要求索引这条文档,并且使用match_phrase query,查询dog & cat或者dog and cat都能match。

解法1:使用char_filter

PUT /my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_mappings_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_mappings_char_filter": {
          "type": "mapping",
          "mappings": [
            "& => and"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "message": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

解法2: 使用synonym

注意 tokenizer要使用whitespace,不能用standard,因为&会被过滤掉

PUT /my-index-000001
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "my_synonym"
          ]
        }
      },
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "synonyms": [
            "& => and"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "message": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Sample 3

有index_a包含一些文档, 要求创建索引index_b,通过reindex api将index_a的文档索引到index_b。 要求增加一个整形字段,value是index_a的field_x的字符长度; 再增加一个数组类型的字段,value是field_y的词集合。(field_y是空格分割的一组词,比方"foo bar",索引到index_b后,要求变成["foo", "bar"]。

解法1: 使用ingest script processor

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "script": {
          "source": """
            ctx.x_length = ctx.x.length();
            String[] ysplit = ctx.y.splitOnToken(" ");
            ArrayList ylist = new ArrayList();
            for (int i=0; i<ysplit.length; i++){
              ylist.add(ysplit[i])
            }
            ctx.y_list = ylist
          """
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "x": "hello",
        "y": "foo bar"
      }
    }
  ]
}

解法2:使用 ingest script + split processor

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "script": {
          "source": """
            ctx.x_length = ctx.x.length();
          """
        }
      },
      {
        "split": {
          "field": "y",
          "separator": " ",
          "target_field": "y_list"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "x": "hello",
        "y": "foo bar zee"
      }
    }
  ]
}

Sample 4

执行Reindex,实现以下两个功能:

  • 把 source index 的某个字段(该字段是数组)里的子项都去掉前后空格
  • 增加一个新字段,这个新字段的值是 source index 的其中两个字段的拼接

解法:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "foreach": {
          "field": "x",
          "processor": {
            "trim": {
              "field": "_ingest._value"
            }
          }
        },
        "script": {
          "source": "ctx.yz = ctx.y + ' ' +ctx.z"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "x": ["foo ", " bar"],
        "y": "hello",
        "z": "world"
      }
    }
  ]
}

Sample 5

对三个字段 a/b/c 查询 xxx, 要求 c 字段 boost 2, 各字段查询算分加和

解法1:multi_match

GET index_a/_search
{
  "query": {
    "multi_match": {
      "type": "most_fields",
      "query": "ssas",
      "fields": ["title^2", "tags"]
    }
  }
}

解法2:bool query should

GET index_a/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": {
              "query": "ssas",
              "boost": 2
            }
          }
        },
        {
          "match": {
            "tags": "ssas"
          }
        }
      ]
    }
  }
}

Sample 6

定义一个 Pipeline,并且将 eathquakes 索引的文档进行更新

  • pipeline的 ID 为 eathquakes_pipeline
  • 将 magnitude_type 的字段值改为大写
  • 如果文档不包含 “batch_number”, 增加这个字段,将数值设置为 1
  • 如果已经包含 batch_number, 字段值加1

解法:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "uppercase": {
          "field": "magnitude_type"
        }
      },
      {
        "script": {
          "source": """
            if(ctx.batch_number == null){
              ctx.batch_number = 1;
            }else{
              ctx.batch_number++;
            }
          """
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "magnitude_type": "foo"
      }
    },
    {
      "_source": {
        "magnitude_type": "bar",
        "batch_number": 2
      }
    }
  ]
}

Sample 7

earthquakes索引中包含了过去11个月的地震信息,请通过一句查询,获取以下信息

  • 过去11个月,每个月的平均地震等级(magnitude)
  • 过去11个月里,平均地震等级最高的一个月及其平均地震等级
  • 搜索不能返回任何文档

解法:

GET earthquakes/_search
{
  "size": 0,
  "aggs": {
    "monthly_aggs": {
      "date_histogram": {
        "field": "time",
        "calendar_interval": "month"
      },
      "aggs": {
        "avg_magnitude": {
          "avg": {
            "field": "magnitude"
          }
        }
      }
    },
    "max_avg_monthly_magnitude": {
      "max_bucket": {
        "buckets_path": "monthly_aggs>avg_magnitude"
      }
    }
  }
}

POST earthquakes/_bulk
{"index":{"_id":1}}
{"time":"2019-01-01T17:00:00", "magnitude":1}
{"index":{"_id":2}}
{"time":"2019-01-01T20:00:00", "magnitude":3}
{"index":{"_id":3}}
{"time":"2019-02-01T17:00:00", "magnitude":4}
{"index":{"_id":3}}
{"time":"2019-02-20T17:00:00", "magnitude":5}
{"index":{"_id":4}}
{"time":"2019-11-01T17:00:00", "magnitude":7}
{"index":{"_id":5}}
{"time":"2019-11-01T17:00:00", "magnitude":8}
{"index":{"_id":6}}
{"time":"2019-11-01T17:00:00", "magnitude":9}

PUT earthquakes
{
  "mappings": {
    "properties": {
      "time": {
        "type": "date"
      },
      "magnitude": {
        "type": "integer"
      }
    }
  }
}
DELETE earthquakes

Sample 8

安装并配置 一个 hot & warm 架构的集群:

  • 三个节点, node 1 为 hot , node2 为 warm,node 3 为cold
  • 三个节点均为 master-eligable 节点
  • 新创建的索引,数据写入 hot 节点
  • 通过一条命令,将数据从 hot 节点移动到 warm 节点

解法:

先配置node attr,编辑elasticsearch.yml,添加如下nodeattr

node.attr.hot_warm_type: hot
node.attr.hot_warm_type: warm
DELETE hotwarm_index
PUT hotwarm_index
{
  "settings": {
      "index.routing.allocation.include.hot_warm_type": "hot",
      "number_of_replicas": 0,
      "number_of_shards": 1
  }
}

PUT hotwarm_index/_bulk
{"index":{"_id":1}}
{"name":"foo"}
{"index":{"_id":2}}
{"name":"bar"}

GET _cat/shards?v

PUT hotwarm_index/_settings
{
  "index.routing.allocation.include.hot_warm_type": "warm"
}

GET _cat/shards?v

Sample 9

ilm + datastream, 数据首先分布在data_hot,2分钟之后rollover,再过5分钟之后,迁移到data_warm,再过3分钟,迁移到data_cold,再过6分钟删除

解法:

DELETE _data_stream/my-datastream
GET .ds-my-datastream-2022.02.26-000001/_ilm/explain # 查看ilm状态
GET _cat/shards/.ds-my-datastream-2022.02.26-000001?v #查看该index的shard分布
GET my-datastream
GET _data_stream/my-datastream

# 一定要POST或者是PUT + op_type为Create,即是要新建doc
POST my-datastream/_doc
{
  "message": "a",
  "@timestamp": "2099-05-06T16:21:15.000Z"
}

# 要设置上`data_stream: {}`,这样才会自动创建出来data_stream
PUT _index_template/my-datastream-template
{
  "index_patterns": [
    "my-datastream*"
  ],
  "data_stream": {},
  "template": {
    "settings": {
      "number_of_replicas": 0,
      "number_of_shards": 1,
      "index.lifecycle.name": "test_policy"
    }
  }
}

PUT _ilm/policy/test_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0m",
        "actions": {
          "rollover": {
            "max_age": "2m"
          }
        }
      },
      "warm": {
        "min_age": "5m",
        "actions": {}
      },
      "cold": {
        "min_age": "8m",
        "actions": {}
      },
      "delete": {
        "min_age": "14m",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}
PUT _cluster/settings
{
  "persistent": {
    "indices.lifecycle.poll_interval": "3s"
  }
}

关于ILM,有几个地方需要注意:

  1. 被ILM管理的索引都有一个age时间,表示存活时间,而每个phase都有一个min_age,指的就是当这个索引的存活时间达到min_age时,即进入对应的phase,所以每个phase的min_age是不断增加的,不能后一个phase比前一个小,但是如果有rollover action的话,age会在rollover之后重置,即hot phase之后的min_age,都是从rollover之后开始的。
  2. 只有当前phase中的actions执行完成之后,才会进入下一个phase,而如果在前一个phase耽误了较长时间,导致超过了下一个phase的min_age,则会很快跳过下一个phase,进入到下下一个phase
  3. min_age为0表示立即进入该phase,所以hot phase都将min_age设置为0,而只有当hot phase中的action执行完成之后,才会执行下一个phase
  4. ILM有个cluster配置:indices.lifecycle.poll_interval,即检查phase切换的间隔,默认是10分钟,因此如果设置的min_age太小的话,不会按照预期的进行切换,因此需要对应的将该poll_interval调小
  5. data stream, ilm之间的关系,他们彼此不依赖,都可以独立使用,ilm主要是来自动化管理索引的,包含data stream和一般的索引,而data stream没有ilm的话,就需要手动进行管理,所以data stream+ilm搭配起来使用是最合适的。
  6. ILM如果是管理的index + alias的话,并且需要rollover的话,一定要配合index template一起使用才有意义,否则rollover之后自动建出来的index,不会被ILM管理,如果不需要rollover,则不需要intex template,只是对单个索引进行管理。
  7. 如果使用hot-warm架构的话,并且使用es内置的data tier去调度,则想让调度生效的话,需要在node.roles中去掉data这个role,要不然设置了 "index.routing.allocation.include._tier_preference": "data_hot" 不生效。官方解释A node can belong to multiple tiers, but a node that has one of the specialized data roles cannot have the generic data role.
  8. 由于primariy shard和replica shard不能在同一个节点上,所以当某一个role的节点只有一个时,需要将replica设为0

若是ILM不是通过data stream来管理index,则会稍微复杂一些,以下为示例:

PUT my-policy-index-000001
{
  "aliases": {
    "test_alias": {
      "is_write_index": true
    }
  }
}

PUT _index_template/my-policy-index_template
{
  "index_patterns": ["my-policy-index-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "index.lifecycle.name": "test_policy",
      "index.lifecycle.rollover_alias": "test_alias",
      "index.routing.allocation.include._tier_preference": "data_hot"
    }
  }
}

PUT _ilm/policy/test_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0m",
        "actions": {
          "rollover": {
            "max_age": "2m"
          }
        }
      },
      "warm": {
        "min_age": "5m",
        "actions": {}
      },
      "cold": {
        "min_age": "8m",
        "actions": {}
      },
      "delete": {
        "min_age": "14m",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Sample 10

有一个索引task2,有field2字段 用match匹配the能查到很多数据,现要求对task2索引进行重建,重建后的索引叫new_task2 然后match匹配the查不到数据

解法1: 使用stop analyzer

PUT test1/_doc/1
{
  "message": "you are the best"
}
PUT test2
{
  "mappings" : {
      "properties" : {
        "message" : {
          "type" : "text",
          "analyzer": "stop",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
}
POST _reindex
{
  "source": {
    "index": "test1"
  },
  "dest": {
    "index": "test2"
  }
}

解法2: 使用stop filter

PUT test1/_doc/1
{
  "message": "you are the best"
}
PUT test3
{
  "settings": {
    "analysis": {
      "analyzer": {
        "stop_analyzer": {
          "tokenizer": "standard",
          "filter": ["stop"]
        }
      }
    }
  },
  "mappings" : {
      "properties" : {
        "message" : {
          "type" : "text",
          "analyzer": "stop_analyzer",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
}
POST _reindex
{
  "source": {
    "index": "test1"
  },
  "dest": {
    "index": "test2"
  }
}

Sample 11

在test索引里创建一个runtime字段,它的值为字段A减去字段B,创建一个range聚合,分为三个级别:

  • 小于0
  • 0到100
  • 100以上
  • 返回文档数为0

解法:

POST test4/_bulk
{"index": {}}
{"A": 100, "B": 200}
{"index": {}}
{"A": 10, "B": 20}
{"index": {}}
{"A": 200, "B": 20}
{"index": {}}
{"A": 100, "B": 20}
{"index": {}}
{"A": 100, "B": 50}

GET test4/_search
{
  "size": 0,
  "runtime_mappings": {
    "C": {
      "type": "long",
      "script": {
        "source": "emit(doc['A'].value-doc['B'].value)"
      }
    }
  },
  "aggs": {
    "caggs": {
      "range": {
        "field": "C",
        "ranges": [
          {
            "to": 0
          },
          {
            "from": 0,
            "to": 100
          },
          {
            "from": 100
          }
        ]
      }
    }
  }
}

Sample 12

testa 和 testb 两索引, 有一个关联字段x, 建立新的索引有testa索引的全部数据, 并且通过x的关联也包含了testb索引对应数据

解法:

PUT testb/_bulk
{"index":{}}
{"b":10,"x":2}
{"index":{}}
{"b":5,"x":5}

PUT testa/_bulk
{"index":{}}
{"a":1,"x":2}
{"index":{}}
{"a":3,"x":2}
{"index":{}}
{"a":5,"x":4}

PUT /_enrich/policy/myenrich-policy
{
  "match": {
    "indices": "testb",
    "match_field": "x",
    "enrich_fields": ["x", "b"]
  }
}

POST /_enrich/policy/myenrich-policy/_execute

PUT _ingest/pipeline/mypipeline
{
  "processors": [
    {
      "enrich": {
        "policy_name": "myenrich-policy",
        "field": "x",
        "target_field": "c"
      }
    }
  ]
}

POST _reindex
{
  "source": {
    "index": "testa"
  },
  "dest": {
    "index": "testc",
    "pipeline": "mypipeline"
  }
}

Sample 13

对集群一上的task9索引编写一个查询,并满足以下要求:

  • 'a','b','c'字段至少有两个字段匹配中'test'关键字
  • 对查询结果进行排序,先按照'a'字段进行降序排序,再按照'_socre'进行升序排序
  • 'a'字段的返回结果高亮显示,前标签是"<h1>",后标签是"</h1>"

解法:

GET test6/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {"a": "test"}},
        {"match": {"b": "test"}},
        {"match": {"c": "test"}}
      ],
      "minimum_should_match": 2
    }
  },
  "highlight": {
    "fields": {
      "a": {}
    },
    "pre_tags": ["<h1>"],
    "post_tags": ["</h1>"]
  },
  "sort": [
    {
      "a.keyword": {
        "order": "desc"
      }
    },
    {
      "_score": {
        "order": "asc"
      }
    }
  ]
}
PUT test6/_bulk
{"index": {}}
{"a": "test", "b": "foo", "c": "bar"}
{"index": {}}
{"a": "test", "b": "test", "c": "bar"}
{"index": {}}
{"a": "test", "b": "foo", "c": "test"}

Sample 14

解决集群变红或者是变黄的问题

解法:

GET _cluster/health

GET _cluster/health?level=indices
GET _cluster/health/my-index-000001?level=shards

GET /_cat/shards/my-index-000001?v
GET _cat/indices?health=yellow&v

GET _cluster/allocation/explain

Sample 15

创建一个搜索模板,name为task10,搜索模板满足以下条件:

  • 对于字段a,搜索param为search_string
  • 使用start_date和end_date参数范围查询timestamp字段,如果没有提供end_date字段,那么结束时间默认是现在
  • 对于返回值,要高亮a字段的内容,用<strong>和</strong>框起来
  • 返回结果先按照b字段排序,然后再按照score排序

写一个搜索语句,对movie索引进行搜索,使用搜索模板为task10,search_string的值为star

解法:

GET task5/_search/template
{
  "id": "task5_template",
  "params": {
    "search_string": "foo",
    "start_date": "2022-01-01"
  }
}

PUT task5/_bulk
{"index": {}}
{"a": "foo", "b": 10, "timestamp": "2022-01-01"}
{"index": {}}
{"a": "foo", "b": 4, "timestamp": "2022-02-01"}
{"index": {}}
{"a": "foo bar", "b": 34, "timestamp": "2022-03-01"}
{"index": {}}
{"a": "bar", "b": 2, "timestamp": "2021-01-01"}

PUT _scripts/task5_template
{
  "script": {
    "lang": "mustache",
    "source": """
    {
      "query": {
        "bool": {
          "filter": [
            {"match": {"a": "{{search_string}}"}},
            {"range": {
              "timestamp": {
                "gte": "{{start_date}}",
                "lte": "{{end_date}}{{^end_date}}now/d{{/end_date}}"
              }
            }}
          ]
        }
      },
      "highlight": {
        "fields": {
          "a": {}
        },
        "pre_tags": ["<strong>"],
        "post_tags": ["</strong>"]
      },
      "sort": [
        {
          "b": {
            "order": "desc"
          }
        },
        {
          "_score": {
            "order": "asc"
          }
        }
      ]
    }
    """
  }
}

注意,因为在source里要写很长的语句,并且kibana没有提示,直接写的话,很容易出错,所以可以先在_search中将语句写出来,然后复制到source字段,但是注意复制的时候,复制的之后,一定要将query语句外面包括的{}给复制过来,即是这种形式:"source": """ {"query": {}} """,而不是 "source": """ "query": {} """

Sample 16

对a字段进行term匹配,对b字段进行match匹配,对c字段进行加权算分,c字段是由另外两个字段得来的

解法:

GET task6/_search
{
  "runtime_mappings": {
    "z": {
      "type": "long",
      "script": {
        "source": "emit(doc['x'].value + doc['y'].value)"
      }
    }
  },
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {"match": {"b": "hello"}},
            {"term": {"a": "foo"}}
          ]
        }
      },
      "script_score": {
        "script": {
          "source": "_score * doc['z'].value"
        }
      }
    }
  }
}

PUT task6/_bulk
{"index": {}}
{"x": 2, "y": 4, "a": "foo", "b": "hello world"}
{"index": {}}
{"x": 100, "y": 50, "a": "bar", "b": "hello world 1"}
{"index": {}}
{"x": 200, "y": 10, "a": "foo", "b": "hello"}

需要注意function_score的作用,是在一个query的基础上,去影响这个query的评分。

标签:index,Elastic,GET,type,source,Practices,PUT,my,Certified
From: https://www.cnblogs.com/hackerain/p/17206834.html

相关文章

  • ES008-Elasticsearch+hbase整合
    1:设计索引库的settings信息的mappings信息,并把这些配置信息保存到一个配置文件中。1.1viarticles.json{"settings":{"number_of_shards":3,"nu......
  • ES007-Elasticsearch中文分词集成
    1、elasticsearch官方只提供smartcn这个中文分词插件,效果不是很好2、引入分词器前命令行下测试curl'http://localhost:9200/jf/_analyze?pretty=true'-d'{......
  • SpringBoot整合ElasticSearch
    ElasticSearch是个开源分布式搜索引擎,提供搜集、分析、存储数据三大功能。它的特点有:分布式,零配置,自动发现,索引自动分片,索引副本机制,restful风格接口,多数据源,自动搜索负载......
  • ElasticSearch 实现分词全文检索 - 测试数据准备
    目录ElasticSearch实现分词全文检索-概述ElasticSearch实现分词全文检索-ES、Kibana、IK安装ElasticSearch实现分词全文检索-Restful基本操作ElasticSearch......
  • Elasticsearch
     ES是基于索引的设计,它没办法像MySQL那样使用join查询,所以,查询数据时我们需要把每条主数据及关联子表的数据全部整合在一条记录中。ES的存储结构  无结构文档......
  • 机器人运动|浅谈Time Elastic Band算法
    前言在自主移动机器人路径规划的学习与开发过程中,我接触到TimeElasticBand算法,并将该算法应用于实际机器人,用于机器人的局部路径规划。在此期间,我也阅读了部分论文、官方......
  • ElasticSearch 实现分词全文检索 - Java SpringBoot ES 索引操作
    目录ElasticSearch实现分词全文检索-概述ElasticSearch实现分词全文检索-ES、Kibana、IK安装ElasticSearch实现分词全文检索-Restful基本操作ElasticSearch......
  • ElasticSearch 实现分词全文检索 - Restful基本操作
    Restful语法GET请求:http://ip:port/index:查询索引信息http://ip;port/index/type/doc_id:查询指定的文档信息POST请求:http://ip;port/index/type/_search:......
  • ElasticSearch 实现分词全文检索 - 概述
    需求做一个类似百度的全文搜索功能所用的技术如下:ElasticSearchKibana管理界面IKAnalysis分词器SpringBootElasticSearch简介ES是一个使用Java语言并且基......
  • elasticsearch 排错总结
    控制台乱码修改elasticsearch-7.6.2\config下的jvm.options文件,在任意行上加上-Dfile.encoding=GBKIK报错但成功启动,按照网上的说法是jdk权限不足,修改方式是改变jdk权限......