标签：www 我们 source Elasticsearch 商超外卖 query guide store

业务背景

我们是外卖搜索系统，在传统的外卖的基础上，推出了便利超市的功能。但是与外卖商家不同的是，我们有很多大型的商超，每个商超的商品数量会非常多，导致线上调用ES大量超时且ES负载较重。

由于我们是多国家业务当前是根据国家拆分所以，以情况最为严重的泰国为例，有数万家商家，和几千万商品，尤其是一些超大型商超甚至有几十万商品。

商家document包含家所卖的商品名称item_names字段。

是否在配送范围又分为两种方式:CBR(city-based-radius,以用户为中心),MBR(merchant-based-radius，以商家为中心)，所以我们的距离过滤语句为：

{
    "query":{
        "bool":{
            "must":[
                {
                    "bool":{
                        "minimum_should_match":"1",
                        "should":[
                            {
                                "bool":{
                                    "filter":[
                                        {
                                            "term":{
                                                "enabled_delivery_radius":0
                                            }
                                        },
                                        {
                                            "geo_distance":{
                                                "distance":"15000m",
                                                "distance_type":"plane",
                                                "geo_location":{
                                                    "lat":13.748888888888888,
                                                    "lon":100.55666666666666
                                                }
                                            }
                                        }
                                    ]
                                }
                            },
                            {
                                "bool":{
                                    "_name":"#mbr_query",
                                    "filter":[
                                        {
                                            "term":{
                                                "enabled_delivery_radius":1
                                            }
                                        },
                                        {
                                            "geo_shape":{
                                                "geometry":{
                                                    "relation":"contains",
                                                    "shape":{
                                                        "coordinates":[
                                                            100.55666666666666,
                                                            13.748888888888889
                                                        ],
                                                        "type":"point"
                                                    }
                                                }
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

召回流程

我们是有多路召回，其中基于ES的召回分为召回商家和召回商品两步：

调整query

拆分query

我们知道，基于location的查询是比较耗时的，基于之前的实战经验，我们决定拆分query，将cbr和mbr拆分成并行的query(由于商超业务qps通常比较小，给es带来的qps不是那么多)
效果：由之前的600ms timeout 90% -> <1% p99 avg 500ms

深度分析

我们知道，es默认的search type是query_then_fetch，如上图所示。参考文档:https://www.elastic.co/guide/en/elasticsearch/guide/current/distributed-search.html

我们可以通过GET index/_stats来获取各个阶段的平均耗时。参考文档:https://www.elastic.co/guide/cn/elasticsearch/guide/current/_monitoring_individual_nodes.html

通过我们对比发现，在我们泰国商超索引fetch的速度很慢，fetch可以理解成查正排索引的过程，于是我们怀疑由于merchant索引上面的item_names过大导致解析数据效率下降，于是我们跑了含有string数组的结构体json.unmarshal的benchmark,从结果来看item越多解析越慢:

探索优化

使用`docvalue_fields`替换`_source`

关于docvalue_fieldshttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-fields.html#docvalue-fields

当取的字段比较少时非常快，但是当取的字段比较多时依旧很慢
不支持text or text_annotated 字段
不支持查询nested对象里面的值

使用`_exclude`来拿掉`_source`里面的一些字段

关于_exclude https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.htm

标明_exclude的字段将会从_source里面移除
即使它不存储在_source里面，我们依然可以在这个字段上搜索(倒排索引里面依然存在)
由于我们商家索引里面的商品名只用来在keyword的匹配，而不需要返回给search服务，所以我们可以考虑拿掉_source里面的item_names字段。
我们发现，即使不拆分query，cost也可以从600ms+降低到100ms-，同时store.size也从4.6G->2.4G。
但是很遗憾的是，在更新时，我们使用了partial_update(我们使用了partial_update),在更新非item_names字段的时候，会导致item_names丢失，因为partial_update会在_source里面读取document然后替换想要更新的字段(https://www.elastic.co/guide/en/elasticsearch/reference/6.8/mapping-source-field.html)。

store

关于store (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/mapping-store.html)ES官方给了个使用场景的例子:

In certain situations it can make sense to `store` a field. 
For instance, if you have a document with a `title`, a `date`, 
and a very large `content` field, 
you may want to retrieve just the `title` and the `date` without having to
 extract those fields from a large `_source` field

这很适合我们的场景，我们只需要把search时需要取的字段设置成store即可.顺带提一下，由于业务原因我们有nested(https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html)类型字段，对于nested里面的字段我们是无法设置store的，但是我们可以在nested外增加一些非nested字段，对这些字段进行单独store设置。

所以我们需要事情有：

索引服务：

修改ES mapping,给需要返回的字段设置"store": true]
从nested里面Copy不能设置store的字段把他们存到新字段里面并给新字段设置成store
搜索服务
默认的，我们是从source里面取的东西，对应json的key是_source
对于商家索引，将从_source中获取信息改成从stored_fields中获取信息，解析返回的数组

在这个优化上线之后，对应索引的ES latency降低了32.5%，错误率降低了98.1%，可用性达到99.9+

标签：www,我们,source,Elasticsearch,商超,外卖,query,guide,store
From： https://www.cnblogs.com/wanber/p/17780571.html

商超外卖搜索基于Elasticsearch的优化实践

业务背景

召回流程

调整query

拆分query

深度分析

探索优化

使用`docvalue_fields`替换`_source`

使用`_exclude`来拿掉`_source`里面的一些字段

store

相关文章

赞助商

阅读排行

商超外卖搜索基于Elasticsearch的优化实践

业务背景

召回流程

调整query

拆分query

深度分析

探索优化

使用docvalue_fields替换_source

使用_exclude来拿掉_source里面的一些字段

store

相关文章

赞助商

阅读排行

使用`docvalue_fields`替换`_source`

使用`_exclude`来拿掉`_source`里面的一些字段