FROM/SIZE分页查询
默认情况下,不加from,size的话,ES会返回前10条记录。加上from,size就会查询指定的条数。其中from代表起始行号,size代表查询行数。如果用JAVA等Client端传参时,要考虑该字段和分页查询的逻辑关系。
from = pageNum * pageSize-1,size = pageSize
ES的分页查询,最大支持10000条数据。from+size<10000
当需要查询数据量大于10000条时,我们要怎么查询呢?
Search After 避免深度分页的问题
- search after 不支持指定页数(不能使用from,但是可以使用size),且只能往下翻
- 第一步搜索需要指定sort,且保证值是唯一的
- 然后下次请求时,需要使用上一次,最后一个文档的sort值进行查询
DELETE users
POST users/_bulk
{ "index" : {} }
{"name":"user1","age":10}
{ "index" : {} }
{"name":"user2","age":11}
{ "index" : {} }
{"name":"user3","age":12}
{ "index" : {} }
{"name":"user4","age":13}
GET users/_search
{
"query": {
"match_all": {}
}
}
POST users/_search
{
"size": 1,
"query": {
"match_all": {}
},
"sort": [
{"age": "desc"} ,
{"_id": "asc"}
]
}
## 排序后的结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "0lZTyoMBXQE6WXGyXiCa",
"_score" : null,
"_source" : {
"name" : "user4",
"age" : 13
},
"sort" : [
13,
"0lZTyoMBXQE6WXGyXiCa"
]
}
]
}
}
##下次再次查询时,带上上次排序的结果
POST users/_search
{
"size": 1,
"query": {
"match_all": {}
},
"search_after": [
13,
"0lZTyoMBXQE6WXGyXiCa"
],
"sort": [
{
"age": "desc"
},
{
"_id": "asc"
}
]
}
## 结果展示
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "0VZTyoMBXQE6WXGyXiCa",
"_score" : null,
"_source" : {
"name" : "user3",
"age" : 12
},
"sort" : [
12,
"0VZTyoMBXQE6WXGyXiCa"
]
}
]
}
}
Scroll API
- 查询的是一个快照数据,有新数据写入以后,无法被查到
- 每次查询后,输入上一次的Scroll Id
- scroll后边要跟上一个scroll查询上下文存活时间,比如5m
## 第一次查询
POST /users/_search?scroll=5m
{
"size": 1,
"query": {
"match_all" : {
}
}
}
## 将上次的scroll_id 填入 scroll 查询
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAADJcWbkhJVy0yNGtSQ0dRdU9UVkVtZU9VQQ=="
}
## 返回结果
{
"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAADJcWbkhJVy0yNGtSQ0dRdU9UVkVtZU9VQQ==",
"took" : 1,
"timed_out" : false,
"terminated_early" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "0FZTyoMBXQE6WXGyXiCa",
"_score" : 1.0,
"_source" : {
"name" : "user2",
"age" : 11
}
}
]
}
}
## 当我们再次插入一条数据
POST users/_doc
{"name":"user5","age":50}
## 再次执行上边scroll查询时,会发现最后到user4时后边就没有了,
## 也就是说快照里面没有刚刚添加的user5的数据
三种分页查询的使用场景
- Regular
需要实时获取顶部的部分文档,例如查询最新的订单数据,不传分页参数,默认返回10条
- Scroll
需要全部的文档,通过快照,遍历数据例如导出全部数据
- Pagination
from 和 size
如果需要深度分页,则选用Search After