一、问题描述:
在mysql数据库中,我们可以很方面的通过having关键字实现对聚合结果的过滤查询。那么,在ES中该如何实现类似having的先聚合再过滤查询呢?
二、业务场景:
需要找出下单次数大于等于2单,并且平均下单金额大于等于100的客户
在关系型数据库中对应的SQL语句:
SELECT
userId,
AVG(amount) avgAmount,
count(*) orderCount
FROM order
GROUP by userId
HAVING avgAmount >= 100 and orderCount >=2
三、数据准备
创建订单索引order_index,并添加测试数据。
## 删除索引
## DELETE order_index
## 新建索引
PUT order_index
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"amount": {
"type": "integer"
}
}
}
}
## 添加数据
POST order_index/_bulk?refresh
{ "create": { } }
{ "name": "老万", "amount": 100}
{ "create": { } }
{ "name": "老万", "amount": 80}
{ "create": { } }
{ "name": "老万", "amount": 300}
{ "create": { } }
{ "name": "老王", "amount": 45}
{ "create": { } }
{ "name": "小明", "amount": 15}
{ "create": { } }
{ "name": "小明", "amount": 50}
{ "create": { } }
{ "name": "小红", "amount": 300}
四、具体实现
1、SQL实现方式
说明:由于ES6.3以后已经支持sql查询,所有首先尝试大家最熟悉的sql查询方案能否实现。
POST /_sql?format=txt
{
"query": "SELECT name,AVG(amount) avgAmount,count(*) orderCount FROM order_index group by name having avgAmount >= 100 and orderCount >=2 "
}
查询结果:
用户名为老万,满足平均订单金额大于100,且下单数大于2。
查询结果正确。
2、DSL实现方式
GET order_index/_search
{
"size": 0,
"aggs": {
"groupName": {
"terms": {
"field": "name"
},
"aggs": {
"avgAmount": {
"avg": {
"field": "amount"
}
},
"having": {
"bucket_selector": {
"buckets_path": {
"orderCount": "_count",
"avgAmount": "avgAmount"
},
"script": {
"source": "params.avgAmount >= 100 && params.orderCount >=2 "
}
}
}
}
}
}
}
查询结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"groupUserId" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "老万",
"doc_count" : 3,
"avgAmount" : {
"value" : 160.0
}
}
]
}
}
}
sql语句底层实现分析:
POST /_sql/translate
{
"query": "SELECT name,AVG(amount) avgAmount,count(*) orderCount FROM order_index group by name having avgAmount >= 100 and orderCount >=2 "
}
执行结果:
分析sql转化的DSL语句,和上面DSL语句的实现,说明两者底层实现原理一致。
mysql中通过having实现根据聚合结果进行过滤,ES中使用 bucket_selector 来实现此功能。
{
"size" : 0,
"_source" : false,
"stored_fields" : "_none_",
"aggregations" : {
"groupby" : {
"composite" : {
"size" : 1000,
"sources" : [
{
"7e80e5b2" : {
"terms" : {
"field" : "name",
"missing_bucket" : true,
"order" : "asc"
}
}
}
]
},
"aggregations" : {
"d8415567" : {
"avg" : {
"field" : "amount"
}
},
"having.having.d8415567_&_having.b26c7698" : {
"bucket_selector" : {
"buckets_path" : {
"a0" : "d8415567",
"a1" : "_count"
},
"script" : {
"source" : "InternalQlScriptUtils.nullSafeFilter(InternalQlScriptUtils.and(InternalQlScriptUtils.nullSafeFilter(InternalQlScriptUtils.gte(params.a0,params.v0)),InternalQlScriptUtils.nullSafeFilter(InternalQlScriptUtils.gte(params.a1,params.v1))))",
"lang" : "painless",
"params" : {
"v0" : 100,
"v1" : 2
}
},
"gap_policy" : "skip"
}
}
}
}
}
}
具体实现
本文主要介绍了ES中如何实现类似having的先聚合再过滤查询。
1、介绍了基于sql和dsl的两种实现方式,但是二者的底层原理其实都是一样的。
2、实际项目中,更推荐直接采用sql来实现,代码简单,sql语句相比dsl上手更容易,也更容易理解。
3、mysql中通过having实现根据聚合结果进行过滤,ES中使用 bucket_selector 来实现此功能。