首页 > 其他分享 >7.3 ElasticSearch运行机制之排序

7.3 ElasticSearch运行机制之排序

时间:2022-10-24 18:02:02浏览次数:44  
标签:count doc age value 7.3 ElasticSearch key 运行机制 avg


1.简介
elasticsearch默认采用相关性算分排序,用户可以通过设定sort参数自行设定排序规则。

2.query
查询job字段为“Java engineer”的文档,然后按照出生日期为第一排序字段、相关性得分为第二排序字段、文档id为第三排序字段进行降序排序。

POST /employee/_search
{
"query": {
"match": {
"job": "Java engineer"
}
},
"sort": [
{
"birthday": "desc"
},
{
"_score": "desc"
},
{
"_doc": "desc"
}
]
}
  • _score:指相关性得分
  • _doc:指文档内部id,和索引的顺序相关(分片内唯一,不同分片内可能相同)
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "x4l1hnsBEsHOdz1YbMo-",
"_score" : 0.2876821,
"_source" : {
"name" : "Jason Tatum",
"job" : "Java engineer",
"age" : 24,
"salary" : 15000.0,
"birthday" : "1997-08-02"
},
"sort" : [
870480000000,
0.2876821,
0
]
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "wol1hnsBEsHOdz1YHsp3",
"_score" : 0.6931472,
"_source" : {
"name" : "Stephen Curry",
"job" : "Java engineer",
"age" : 27,
"salary" : 20000.0,
"birthday" : "1995-08-06"
},
"sort" : [
807667200000,
0.6931472,
0
]
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "wYl0hnsBEsHOdz1Y4cqT",
"_score" : 0.47000363,
"_source" : {
"name" : "James Harden",
"job" : "Java engineer",
"age" : 31,
"salary" : 30000.0,
"birthday" : "1991-01-01"
},
"sort" : [
662688000000,
0.47000363,
0
]
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "xol1hnsBEsHOdz1YXcqt",
"_score" : 0.47000363,
"_source" : {
"name" : "Chirs Paul",
"job" : "Java engineer",
"age" : 33,
"salary" : 29000.0,
"birthday" : "1988-12-02"
},
"sort" : [
597024000000,
0.47000363,
2
]
}
]
}
}

3.字符串排序
(1).简介
字符串排序比较特殊,因为elasticsearch有text和keyword两种类型,针对text类型排序,会报如下错误。

POST /employee/_search
{
"sort": [{
"name": "desc"
}]
}
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "employee",
"node": "0JDoSBAVQr29ZB1mytSQgw",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
},
"status": 400
}

默认情况下,text类型字段上禁用fielddata,如果要达到预期排序效果,可以在name字段上设置fielddata=true,以便通过取消倒排索引将fielddata加载到内存中。请注意,这可能会占用大量内存,因此建议将name字段设置为keyword。

(2).解决方案
排序的过程实质上是对字段原始内容排序的过程,这个过程中倒排索引无法发挥作用,需要用到正排索引,也就是通过文档id和字段可以快速得到字段原始内容。elasticsearch提供了两种实现方式,一种是fielddata默认禁用,一种是doc_values默认启用(除了text类型),fielddata和doc_values的对比如下。

对比

fielddata

doc_values

创建时机

搜索时即时创建

索引时创建,与倒排索引创建时机一致

创建位置

jvm heap

磁盘

优点

不会占用额外的磁盘资源

不会占用heap内存

缺点

文档过多时,即时创建会花费过多时间,占用过多heap内存

减慢索引的速度,占用额外的磁盘资源

3.fielddata
(1).简介
fielddata默认是关闭的,开启后,字符串是按照分词后的term排序,往往结果很难符合预期。其次,一般是在对分词做聚合分析的时候开启。注意fielddata只针对text类型有效。

(2).开启

PUT /people/_mapping
{
"properties": {
"name": {
"type": "text",
"fielddata": true
},
"country": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"birthday": {
"type": "birthday",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"description": {
"type": "text"
}
}
}

4.doc_values
(1).简介
doc_values默认是启用的,如果在创建索引时明确知道该字段不参与排序和聚合分析时就可以将其关闭以加快索引速度,节省磁盘空间。如果后面需要再开启doc_values,需要做reindex操作。

(2).开启

PUT /people/_mapping
{
"properties": {
"name": {
"type": "text",
"fielddata": true
},
"country": {
"type": "keyword"
},
"age": {
"type": "integer",
"doc_values": false
},
"birthday": {
"type": "birthday",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"description": {
"type": "text"
}
}
}

5.聚合分析中的排序
(1).简介
可以使用自带的关键字进行排序,如_count表示对文档数进行排序,_key表示按照field值进行排序,其基本语法如下。

PUT /people/_mapping
{
"size": 0,
"aggs": {
"aggs_name": {
"terms": {
"field": "job",
"size": 10,
"order": [{
"_count": "desc"
},
{
"_key": "desc"
}
]
}
}
}
}
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"aggs_name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Java engineer",
"doc_count" : 4
},
{
"key" : "Vue engineer",
"doc_count" : 2
},
{
"key" : "Technical director",
"doc_count" : 1
}
]
}
}
}

(2).order

PUT /people/_mapping
{
"size": 0,
"aggs": {
"histogram_salary": {
"histogram": {
"field": "salary",
"interval": 5000,
"order": {
"avg_age": "desc"
}
},
"aggs": {
"avg_age": {
"avg": {
"field": "age"
}
}
}
}
}
}
{
"took" : 23,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"histogram_salary" : {
"buckets" : [
{
"key" : 50000.0,
"doc_count" : 1,
"avg_age" : {
"value" : 35.0
}
},
{
"key" : 25000.0,
"doc_count" : 2,
"avg_age" : {
"value" : 31.5
}
},
{
"key" : 30000.0,
"doc_count" : 1,
"avg_age" : {
"value" : 31.0
}
},
{
"key" : 20000.0,
"doc_count" : 1,
"avg_age" : {
"value" : 27.0
}
},
{
"key" : 15000.0,
"doc_count" : 2,
"avg_age" : {
"value" : 24.5
}
},
{
"key" : 35000.0,
"doc_count" : 0,
"avg_age" : {
"value" : null
}
},
{
"key" : 40000.0,
"doc_count" : 0,
"avg_age" : {
"value" : null
}
},
{
"key" : 45000.0,
"doc_count" : 0,
"avg_age" : {
"value" : null
}
}
]
}
}
}

(3).更深层次的嵌套

PUT /people/_mapping
{
"size": 0,
"aggs": {
"histogram_salary": {
"histogram": {
"field": "salary",
"interval": 5000,
"order": {
"age>avg_age": "desc"
}
},
"aggs": {
"age": {
"filter": {
"range": {
"age": {
"gte": 10
}
}
},
"aggs": {
"avg_age": {
"avg": {
"field": "age"
}
}
}
}
}
}
}
}
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"histogram_salary" : {
"buckets" : [
{
"key" : 50000.0,
"doc_count" : 1,
"age" : {
"doc_count" : 1,
"avg_age" : {
"value" : 35.0
}
}
},
{
"key" : 25000.0,
"doc_count" : 2,
"age" : {
"doc_count" : 2,
"avg_age" : {
"value" : 31.5
}
}
},
{
"key" : 30000.0,
"doc_count" : 1,
"age" : {
"doc_count" : 1,
"avg_age" : {
"value" : 31.0
}
}
},
{
"key" : 20000.0,
"doc_count" : 1,
"age" : {
"doc_count" : 1,
"avg_age" : {
"value" : 27.0
}
}
},
{
"key" : 15000.0,
"doc_count" : 2,
"age" : {
"doc_count" : 2,
"avg_age" : {
"value" : 24.5
}
}
},
{
"key" : 35000.0,
"doc_count" : 0,
"age" : {
"doc_count" : 0,
"avg_age" : {
"value" : null
}
}
},
{
"key" : 40000.0,
"doc_count" : 0,
"age" : {
"doc_count" : 0,
"avg_age" : {
"value" : null
}
}
},
{
"key" : 45000.0,
"doc_count" : 0,
"age" : {
"doc_count" : 0,
"avg_age" : {
"value" : null
}
}
}
]
}
}
}


标签:count,doc,age,value,7.3,ElasticSearch,key,运行机制,avg
From: https://blog.51cto.com/u_15843693/5790759

相关文章