ES聚合分桶查询报错分析，trying to create too many buckets

标签：search 分桶 color many buckets 查询报错 max size

场景

es查询报错，报错如下：

trying to create too many buckets. must be less than or equal to: [10000] but was [10001].

问题分析

显示网上搜索了一番，大都给出了问题原因，那就是ES聚合查询的聚合桶查询默认为10000，查询超过的时候，就会报错，导致查询失败。由于Bucket aggregations查询操作比较消耗内存，如果聚集桶过多，频率较大时，很容易导致集群JVM内存不足，进而产生查询熔断。

我们看下官网描述，确实是这么回事：

问题复现

注意官网的这一句话：

Requests that attempt to return more than this limit will return an error.
翻译：试图返回超过此限制的请求将返回错误。

那么我们来验证一下，复现一下问题：

环境说明：本人采用的是es7.8.1的版本进行演示测试,使用postman进行测试

创建测试聚合桶查询的索引库（PUT请求）

localhost:9200/testbuckets
{
    "mappings": {
        "properties": {
            "name": {
                "type": "keyword"
            },
            "price": {
                "type": "long"
            },
            "color": {
                "type": "keyword"
            },
            "size": {
                "type": "long"
            },
            "category": {
                "type": "keyword"
            },
            "label": {
                "type": "keyword"
            },
            "release_date": {
                "type": "date"
            }
        }
    }
}

====返回====
{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "testbuckets"
}

构建测试数据，为了模拟分桶数据，我将创建一些手机测试数据用于演示。

创建数据（PUT请求）

localhost:9200/testbuckets/_bulk
{"index":{}}
{"name":"小米13","price":3400,"color":"白色","size":6.21,"category":"小米","label":"高端","release_date":"2022-02-06"}
{"index":{}}
{"name":"小米13","price":3400,"color":"黑色","size":6.21,"category":"小米","label":"高端","release_date":"2022-02-06"}
{"index":{}}
{"name":"小米12","price":2400,"color":"黑色","size":6.21,"category":"小米","label":"快充","release_date":"2021-02-06"}
{"index":{}}
{"name":"苹果13","price":3400,"color":"黑色","size":6.21,"category":"苹果","label":"性能","release_date":"2022-02-06"}
{"index":{}}
{"name":"苹果14","price":2400,"color":"远峰蓝色","size":6.21,"category":"苹果","label":"性能","release_date":"2021-02-06"}
{"index":{}}
{"name":"华为Mate 50","price":5200,"color":"白色","size":6.21,"category":"华为","label":"鸿蒙OS","release_date":"2021-05-06"}
{"index":{}}
{"name":"华为Mate 50","price":5200,"color":"黑色","size":6.21,"category":"华为","label":"鸿蒙OS","release_date":"2021-05-06"}
{"index":{}}
{"name":"华为Mate 40 Pro","price":5900,"color":"黑色","size":6.21,"category":"华为","label":"商务","release_date":"2022-02-06"}
{"index":{}}
{"name":"华为Mate 40 Pro","price":5900,"color":"白色","size":6.21,"category":"华为","label":"商务","release_date":"2022-02-06"}

ps:使用postman批量插入测试数据，最后一条记录后面需要换行，否则会插入失败

修改es最大分桶配置参数(PUT请求)

http://127.0.0.1:9200/_cluster/settings
{
  "persistent": {
    "search.max_buckets": 2  //为了演示，设值为2，官方默认此值为10000
  }
}

构造聚合查询，在上面，我们创建了测试使用的手机索引库，此时，我们按照手机颜色进行数据聚合分组查询：

查询数据（GET请求）

http://127.0.0.1:9200/testbuckets/_search
{
    "aggs": { 
        "by_color": {  // 按照颜色合聚合
            "terms": { 
                "field": "color"  // 聚合字段
            }
        }
    },
    "size": 0 //不显示原始数据，只看分组数据
}

===返回结果===
{
    "error": {
        "root_cause": [
            {
                "type": "too_many_buckets_exception",
                "reason": "Trying to create too many buckets. Must be less than or equal to: [2] but was [3]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
                "max_buckets": 2
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "testbuckets",
                "node": "UuMcBk37TNWHjY4hVtzyVA",
                "reason": {
                    "type": "too_many_buckets_exception",
                    "reason": "Trying to create too many buckets. Must be less than or equal to: [2] but was [3]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
                    "max_buckets": 2
                }
            }
        ]
    },
    "status": 503
}

可以看到错误已经复现了，为什么呢，因为我们刚才将search.max_buckets配置参数改成了2，而实际上我们的测试数据里，如果根据color来分桶查询，会有黑色，白色，远峰蓝色这3种颜色，查询结果超过了限制值，就报错了，就证明了官方的Requests that attempt to return more than this limit will return an error的说法

我们将search.max_buckets改大一点再看(PUT请求)

http://127.0.0.1:9200/_cluster/settings
{
  "persistent": {
    "search.max_buckets": 3
  }
}
===返回结果===
{
    "acknowledged": true,
    "persistent": {
        "search": {
            "max_buckets": "3"
        }
    },
    "transient": {}
}

再次执行查询（GET请求），就会发现不报错了。

http://127.0.0.1:9200/testbuckets/_search
{
    "aggs": { 
        "by_color": {  // 按照颜色合聚合
            "terms": { 
                "field": "color"  // 聚合字段
            }
        }
    },
    "size": 0 //不显示原始数据，只看分组数据
}
===返回结果===
    {
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 9,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "by_color": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "黑色",
                    "doc_count": 5
                },
                {
                    "key": "白色",
                    "doc_count": 3
                },
                {
                    "key": "远峰蓝色",
                    "doc_count": 1
                }
            ]
        }
    }
}

解决方案

方案一

修改配置(PUT请求)：

// 临时修改，es重启失效
http://127.0.0.1:9200/_cluster/settings
{"transient": {"search.max_buckets": 50000}}

// 永久修改，esc重启依旧生效
http://127.0.0.1:9200/_cluster/settings
{"persistent": {"search.max_buckets": 50000}}

服务器上可使用curl命令：

// 临时修改，es重启失效
curl -X PUT "http://127.0.0.1:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{"transient":{"search.max_buckets": 50000}}'

// 永久修改，esc重启依旧生效
curl -X PUT -k  "http://127.0.0.1:9200/_cluster/settings" -H 'Content-Type: application/json' -d '{"persistent":{"search.max_buckets": 1000000}}'

说明：es一些配置修改，有两种方式，上面我修改最大分桶参数使用到了persistent，二者区别如下。
transient 临时：这些设置在集群重启之前一直会生效。一旦整个集群重启，这些设置就会被清除。
persistent 永久：这些设置永久保存，除非再次被手动修改。是将修改持久化到文件中，重启之后也不影响。

弊端：由于Bucket aggregations查询操作比较消耗内存，如果聚集桶过多，频率较大时，很容易导致集群JVM内存不足，进而产生查询熔断，所以不能无限制的扩大分桶数参数，需要结合实际服务器能力与业务需求来考量一个合适的值。

方案二

整改业务，大多场景下，对于查询结果是想获取到自己想要的精确少量结果，es 默认设置 10000 的上限是有原因的，这块需要对你的服务器性能有一个评估，考虑你的 es 服务是否能撑得住这种大量的聚合计算，冒然扩大限制可能导致服务的崩溃。如果预计的 buckets 数量级别过大，就需要结合具体场景分析在查询层面进行优化。

比如还拿我上面的手机测试库来看，如果非要不改配置而实现查询不报错，那么我们可以这样来优化一下查询业务。

此处仅做个参考思路演示，具体实际生产场景需要结合具体业务考虑。

http://127.0.0.1:9200/testbuckets/_search
{"query":{  
      "match":{  
         "category":"小米"  // 限制查询手机类别，缩小数据范围，此处以小米品牌为例
      }
   },
    "aggs": { 
        "by_color": { 
            "terms": { 
                "field": "color" 
            }
        }
    },
    "size": 0 //不显示原始数据，只看分组数据
}

===返回结果====
{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "by_color": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "黑色",
                    "doc_count": 2
                },
                {
                    "key": "白色",
                    "doc_count": 1
                }
            ]
        }
    }
}

标签：search,分桶,color,many,buckets,查询,报错,max,size
From： https://www.cnblogs.com/lanshan-blog/p/17072815.html

ES聚合分桶查询报错分析，trying to create too many buckets

场景

问题分析

问题复现

解决方案

方案一

方案二

相关文章

赞助商

阅读排行