https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-search-after.html#search-request-search-after
Pagination of results can be done by using the from
and size
but the cost becomes prohibitive when the deep pagination is reached. The index.max_result_window
which defaults to 10,000 is a safeguard, search requests take heap memory and time proportional to from + size
. The Scroll api is recommended for efficient deep scrolling but scroll contexts are costly and it is not recommended to use it for real time user requests. The search_after
parameter circumvents this problem by providing a live cursor. The idea is to use the results from the previous page to help the retrieval of the next page.
search_after를 사용해 특정 시간 범위의 HTTP라는 값을 찾고 pagination을 하고 싶은 때의 예제이다.
처음 시도할 때는 search_after를 시도하지 않고 ordering을 잘해야 한다. 동일 시간대에 uid를 기준으로 진행한다.
$ curl log.google.io:9200/log-2017.11.14/logstash/_search?pretty=true -d '{
"query": {
"bool": {
"must": [
{
"match": {
"_all": "*HTTP*"
}
},
{
"range": {
"@timestamp": {
"gte": "2017-11-14T11:00:00+09:00",
"lte": "2017-11-14T12:00:00+09:00"
}
}
}
]
}
},
"size":3,
"sort":[
{
"@timestamp":{"order":"asc"},
"_uid": { "order": "desc" }
}
]
}'
{
"took" : 94,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 488876,
"max_score" : null,
"hits" : [
{
"_index" : "log-2017.11.14",
"_type" : "logstash",
"_id" : "log-1d8a4f45-a15f-41bf-95f2-399c88844b60",
"_score" : null,
"_source" : {
"pid" : "13434",
"severity" : "INFO",
"ident" : "nova.metadata.wsgi.server",
"message" : "[-] 1.1.1.1 \"GET /latest/meta-data/ami-id HTTP/1.1\" status: 200 len: 129 time: 0.0031130",
"hostname" : "eastzone-pg1-api001",
"node" : "openstack_control",
"type" : "nova_api",
"phase" : "pg1",
"@timestamp" : "2017-11-14T11:00:00+09:00",
"_uuid" : "log-1d8a4f45-a15f-41bf-95f2-399c88844b60",
},
"sort" : [
1510624800000,
"logstash#log-1d8a4f45-a15f-41bf-95f2-399c88844b60"
]
},
{
"_index" : "log-2017.11.14",
"_type" : "logstash",
"_id" : "log-ae067d5d-0d90-4414-ad6c-1c7cf286f95c",
"_score" : null,
"_source" : {
"pid" : "4386",
"severity" : "INFO",
"ident" : "nova.osapi_compute.wsgi.server",
"message" : "[-] 10.197.12.118,10.60.19.248 \"GET /v2/1ba72e3bdbe8491ba851f2f9fb4eb6f1/servers/a6c14368-1816-4908-8cab-0b97fed31f40/ips HTTP/1.1\" status: 401 len: 297 time: 0.0048220",
"hostname" : "eastzone-pg1-api004",
"node" : "openstack_control",
"type" : "nova_api",
"phase" : "pg1",
"@timestamp" : "2017-11-14T11:00:00+09:00",
"_uuid" : "log-ae067d5d-0d90-4414-ad6c-1c7cf286f95c",
},
"sort" : [
1510624800000,
"logstash#log-ae067d5d-0d90-4414-ad6c-1c7cf286f95c"
]
},
{
"_index" : "log-2017.11.14",
"_type" : "logstash",
"_id" : "log-9ddf30a2-cfc1-4b76-b50b-5f4bcae02a94",
"_score" : null,
"_source" : {
"pid" : "4431",
"severity" : "INFO",
"ident" : "nova.metadata.wsgi.server",
"message" : "[-] 10.60.34.9,10.60.19.248 \"GET /latest/meta-data/public-ipv4 HTTP/1.1\" status: 200 len: 116 time: 0.0032690",
"hostname" : "eastzone-pg1-api004",
"node" : "openstack_control",
"type" : "nova_api",
"phase" : "pg1",
"@timestamp" : "2017-11-14T11:00:00+09:00",
"_uuid" : "log-9ddf30a2-cfc1-4b76-b50b-5f4bcae02a94",
},
"sort" : [
1510624800000,
"logstash#log-9ddf30a2-cfc1-4b76-b50b-5f4bcae02a94"
]
}
]
}
}
맨 마지막 uuid를 기반으로 uid를 생성한다. uid는 type과 uuid를 합친 것이다. (디폴트로 uuid는 사용할 수 없다. 검색안되고, 인덱스도 안됨)
$ curl log.google.io:9200/log-2017.11.14/logstash/_search?pretty=true -d '{
"query": {
"bool": {
"must": [
{
"match": {
"_all": "*HTTP*"
}
},
{
"range": {
"@timestamp": {
"gte": "2017-11-14T11:00:00+09:00",
"lte": "2017-11-14T12:00:00+09:00"
}
}
}
]
}
},
"search_after": ["1510624800000", "logstash#log-9ddf30a2-cfc1-4b76-b50b-5f4bcae02a94"],
"size":3,
"sort":[
{
"@timestamp":{"order":"asc"},
"_uid": { "order": "desc" }
}
]
}'