Elasticsearch

[elasticsearch5] search_after 예제

'김용환' 2017. 11. 24. 11:20


elasticsearch의 5.x에는 기존의 무거웠던 from+size와 scroll api를 대체하는 search_after가 있다. 

전에 scroll을 써보면서 얼마나 무겁고,메모리가 이슈였다.(state기반)

search_after는 live cursor를 제공하면서 stateless이기 때문에 성능을 최적화할 수 있는 것 같다.


https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-search-after.html#search-request-search-after


Pagination of results can be done by using the from and size but the cost becomes prohibitive when the deep pagination is reached. The index.max_result_window which defaults to 10,000 is a safeguard, search requests take heap memory and time proportional to from + size. The Scroll api is recommended for efficient deep scrolling but scroll contexts are costly and it is not recommended to use it for real time user requests. The search_after parameter circumvents this problem by providing a live cursor. The idea is to use the results from the previous page to help the retrieval of the next page.





search_after를 사용해 특정 시간 범위의 HTTP라는 값을 찾고 pagination을 하고 싶은 때의 예제이다.


처음 시도할 때는 search_after를 시도하지 않고 ordering을 잘해야 한다. 동일 시간대에 uid를 기준으로 진행한다.



$ curl log.google.io
:9200/log-2017.11.14/logstash/_search?pretty=true -d '{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "_all": "*HTTP*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2017-11-14T11:00:00+09:00",
              "lte": "2017-11-14T12:00:00+09:00"
            }
          }
        }
      ]
    }
  },
  "size":3,
  "sort":[
      {
          "@timestamp":{"order":"asc"},
          "_uid": { "order": "desc" }
      }
   ]

}'





확인하면 다음과 같은 결과가 나올 것이다. 



{
  "took" : 94,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 488876,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "log-2017.11.14",
        "_type" : "logstash",
        "_id" : "log-1d8a4f45-a15f-41bf-95f2-399c88844b60",
        "_score" : null,
        "_source" : {
          "pid" : "13434",
          "severity" : "INFO",
          "ident" : "nova.metadata.wsgi.server",
          "message" : "[-] 1.1.1.1 \"GET /latest/meta-data/ami-id HTTP/1.1\" status: 200 len: 129 time: 0.0031130",
          "hostname" : "eastzone-pg1-api001",
          "node" : "openstack_control",
          "type" : "nova_api",
          "phase" : "pg1",
          "@timestamp" : "2017-11-14T11:00:00+09:00",
          "_uuid" : "log-1d8a4f45-a15f-41bf-95f2-399c88844b60",
        },
        "sort" : [
          1510624800000,
          "logstash#log-1d8a4f45-a15f-41bf-95f2-399c88844b60"
        ]
      },
      {
        "_index" : "log-2017.11.14",
        "_type" : "logstash",
        "_id" : "log-ae067d5d-0d90-4414-ad6c-1c7cf286f95c",
        "_score" : null,
        "_source" : {
          "pid" : "4386",
          "severity" : "INFO",
          "ident" : "nova.osapi_compute.wsgi.server",
          "message" : "[-] 10.197.12.118,10.60.19.248 \"GET /v2/1ba72e3bdbe8491ba851f2f9fb4eb6f1/servers/a6c14368-1816-4908-8cab-0b97fed31f40/ips HTTP/1.1\" status: 401 len: 297 time: 0.0048220",
          "hostname" : "eastzone-pg1-api004",
          "node" : "openstack_control",
          "type" : "nova_api",
          "phase" : "pg1",
          "@timestamp" : "2017-11-14T11:00:00+09:00",
          "_uuid" : "log-ae067d5d-0d90-4414-ad6c-1c7cf286f95c",
        },
        "sort" : [
          1510624800000,
          "logstash#log-ae067d5d-0d90-4414-ad6c-1c7cf286f95c"
        ]
      },
      {
        "_index" : "log-2017.11.14",
        "_type" : "logstash",
        "_id" : "log-9ddf30a2-cfc1-4b76-b50b-5f4bcae02a94",
        "_score" : null,
        "_source" : {
          "pid" : "4431",
          "severity" : "INFO",
          "ident" : "nova.metadata.wsgi.server",
          "message" : "[-] 10.60.34.9,10.60.19.248 \"GET /latest/meta-data/public-ipv4 HTTP/1.1\" status: 200 len: 116 time: 0.0032690",
          "hostname" : "eastzone-pg1-api004",
          "node" : "openstack_control",
          "type" : "nova_api",
          "phase" : "pg1",
          "@timestamp" : "2017-11-14T11:00:00+09:00",
          "_uuid" : "log-9ddf30a2-cfc1-4b76-b50b-5f4bcae02a94",
        },
        "sort" : [
          1510624800000,
          "logstash#log-9ddf30a2-cfc1-4b76-b50b-5f4bcae02a94"
        ]
      }
    ]
  }




맨 마지막 uuid를 기반으로 uid를 생성한다. uid는 type과 uuid를 합친 것이다. (디폴트로 uuid는 사용할 수 없다. 검색안되고, 인덱스도 안됨)



$ curl log.google.io:9200/log-2017.11.14/logstash/_search?pretty=true -d '{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "_all": "*HTTP*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2017-11-14T11:00:00+09:00",
              "lte": "2017-11-14T12:00:00+09:00"
            }
          }
        }
      ]
    }
  },
  "search_after": ["1510624800000", "logstash#log-9ddf30a2-cfc1-4b76-b50b-5f4bcae02a94"],
  "size":3,
  "sort":[
      {
          "@timestamp":{"order":"asc"},
          "_uid": { "order": "desc" }
      }
   ]

}'





결과 잘 나옴..