간단하게 elasticsearch가 정상적으로 동작하는지 체크할 수 있는 python 코드 예시이다.

먼저 elasticsearch python 모듈을 설치한다.


$ pip install elasticsearch



alive check가 없어서 dummy index를 생성했다가 삭제하는 코드를 추가한다.


from elasticsearch import Elasticsearch



es = Elasticsearch(host=host, port=9200)


..

    try:

        es.indices.create(index='test-index', ignore=400)

        es.indices.delete(index='test-index', ignore=[400, 404])

    except :

        put_err('server down', es_host[count])





elasticsearch rest api 중 _stats은 현재 es 서버 상황을 볼 수 있다.




$curl -XGET 'http://장비이름:9200/_nodes/stats'

{"cluster_name":"google_es","nodes":{"kpG92KHDSoO7NhvfvCetog":{"timestamp":1458295650270,"name":"green045","transport_address":"inet[/172.17.50.245:9300]","host":"green045.kr3.iwilab.com","ip":["inet[/172.17.50.245:9300]","NONE"],"attributes":{"master":"true"},"indices":{"docs":{"count":2049610,"deleted":0},"store":{"size_in_bytes":2780497067,"throttle_time_in_millis":0},"indexing":{"index_total":82828641,"index_time_in_millis":33104647,"index_current":5373,"delete_total":0,"delete_time_in_millis":0,"delete_current":0,"noop_update_total":0,"is_throttled":false,"throttle_time_in_millis":536},"get":{"total":4,"time_in_millis":0,"exists_total":1,"exists_time_in_millis":0,"missing_total":3,"missing_time_in_millis":0,"current":0},"search":{"open_contexts":0,"query_total":89666361,"query_time_in_millis":42660274,"query_current":0,"fetch_total":89666355,"fetch_time_in_millis":5548098,"fetch_current":0},"merges":{"current":0,"current_docs":0,"current_size_in_bytes":0,"total":2674,"total_time_in_millis":22264700,"total_docs":111622903,"total_size_in_bytes":300312239738},"refresh":{"total":17402,"total_time_in_millis":22551946},"flush":{"total":1609,"total_time_in_millis":6980833},"warmer":{"current":0,"total":261,"total_time_in_millis":53},"filter_cache":{"memory_size_in_bytes":478164,"evictions":0},"id_cache":{"memory_size_in_bytes":0},"fielddata":{"memory_size_in_bytes":802760,"evictions":9},"percolate":{"total":0,"time_in_millis":0,"current":0,"memory_size_in_bytes":-1,"memory_size":"-1b","queries":0},"completion":{"size_in_bytes":92612272},"segments":{"count":34,"memory_in_bytes":124820108,"index_writer_memory_in_bytes":0,"index_writer_max_memory_in_bytes":2048000,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":0},"translog":{"operations":0,"size_in_bytes":17},"suggest":{"total":16763,"time_in_millis":6103,"current":0},"query_cache":{"memory_size_in_bytes":0,"evictions":0,"hit_count":0,"miss_count":0}},"os":{"timestamp":1458295650270,"uptime_in_millis":18157670,"load_average":[0.0,0.0,0.0],"cpu":{"sys":0,"user":0,"idle":99,"usage":0,"stolen":0},"mem":{"free_in_bytes":6580514816,"used_in_bytes":27076222976,"free_percent":29,"used_percent":70,"actual_free_in_bytes":9924431872,"actual_used_in_bytes":23732305920},"swap":{"used_in_bytes":0,"free_in_bytes":10737414144}},"process":{"timestamp":1458295650271,"open_file_descriptors":289,"cpu":{"percent":0,"sys_in_millis":7528750,"user_in_millis":162504230,"total_in_millis":170032980},"mem":{"resident_in_bytes":22813118464,"share_in_bytes":120283136,"total_virtual_in_bytes":27931869184}},"jvm":{"timestamp":1458295650271,"uptime_in_millis":10310773097,"mem":{"heap_used_in_bytes":13092004720,"heap_used_percent":61,"heap_committed_in_bytes":21405106176,"heap_max_in_bytes":21405106176,"non_heap_used_in_bytes":110885664,"non_heap_committed_in_bytes":113668096,"pools":{"young":{"used_in_bytes":529111904,"max_in_bytes":558432256,"peak_used_in_bytes":558432256,"peak_max_in_bytes":558432256},"survivor":{"used_in_bytes":3899608,"max_in_bytes":69730304,"peak_used_in_bytes":69730304,"peak_max_in_bytes":69730304},"old":{"used_in_bytes":12558993208,"max_in_bytes":20776943616,"peak_used_in_bytes":15920147088,"peak_max_in_bytes":20776943616}}},"threads":{"count":611,"peak_count":614},"gc":{"collectors":{"young":{"collection_count":78537,"collection_time_in_millis":2869917},"old":{"collection_count":140,"collection_time_in_millis":8780}}},"buffer_pools":{"direct":{"count":15882,"used_in_bytes":283727750,"total_capacity_in_bytes":283727750},"mapped":{"count":0,"used_in_bytes":0,"total_capacity_in_bytes":0}}},"thread_pool":{"percolate":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"listener":{"threads":4,"queue":0,"active":0,"rejected":0,"largest":4,"completed":194262},"index":{"threads":4,"queue":0,"active":0,"rejected":0,"largest":4,"completed":4},"refresh":{"threads":1,"queue":0,"active":0,"rejected":0,"largest":4,"completed":9120},"suggest":{"threads":256,"queue":0,"active":0,"rejected":0,"largest":256,"completed":6541558},"generic":{"threads":5,"queue":0,"active":0,"rejected":0,"largest":5,"completed":1242222},"warmer":{"threads":1,"queue":0,"active":0,"rejected":0,"largest":3,"completed":20792},"search":{"threads":256,"queue":0,"active":0,"rejected":0,"largest":256,"completed":179332719},"flush":{"threads":1,"queue":0,"active":0,"rejected":0,"largest":4,"completed":8768},"optimize":{"threads":1,"queue":0,"active":0,"rejected":0,"largest":1,"completed":1},"management":{"threads":5,"queue":0,"active":1,"rejected":0,"largest":5,"completed":2955615},"get":{"threads":0,"queue":0,"active":0,"rejected":0,"largest":0,"completed":0},"merge":{"threads":1,"queue":0,"active":0,"rejected":0,"largest":4,"completed":26901},"bulk":{"threads":8,"queue":0,"active":0,"rejected":0,"largest":8,"completed":12738},"snapshot":{"threads":1,"queue":0,"active":0,"rejected":0,"largest":2,"completed":2157}},"network":{"tcp":{"active_opens":1491570,"passive_opens":4946581,"curr_estab":82,"in_segs":443092500,"out_segs":317298535,"retrans_segs":38975,"estab_resets":1384179,"attempt_fails":3585983,"in_errs":0,"out_rsts":38442}},"fs":{"timestamp":1458295650271,"total":{"total_in_bytes":284037365760,"free_in_bytes":180158369792,"available_in_bytes":165723308032,"disk_reads":74699,"disk_writes":12144405,"disk_io_op":12219104,"disk_read_size_in_bytes":2125903872,"disk_write_size_in_bytes":689377091584,"disk_io_size_in_bytes":691502995456,"disk_queue":"0","disk_service_time":"0"},"data":[{"path":"/home/elasticsearch/data/google_es/nodes/0","mount":"/","dev":"/dev/sda2","total_in_bytes":284037365760,"free_in_bytes":180158369792,"available_in_bytes":165723308032,"disk_reads":74699,"disk_writes":12144405,"disk_io_op":12219104,"disk_read_size_in_bytes":2125903872,"disk_write_size_in_bytes":689377091584,"disk_io_size_in_bytes":691502995456,"disk_queue":"0","disk_service_time":"0"}]},"transport":{"server_open":13,"rx_count":6,"rx_size_in_bytes":1860,"tx_count":6,"tx_size_in_bytes":1860},"http":{"current_open":54,"total_opened":1080551},"breakers":{"request":{"limit_size_in_bytes":8562042470,"limit_size":"7.9gb","estimated_size_in_bytes":0,"estimated_size":"0b","overhead":1.0,"tripped":0},"fielddata":{"limit_size_in_bytes":12843063705,"limit_size":"11.9gb","estimated_size_in_bytes":802760,"estimated_size":"783.9kb","overhead":1.03,"tripped":0},"parent":{"limit_size_in_bytes":14983574323,"limit_size":"13.9gb","estimated_size_in_bytes":802760,"estimated_size":"783.9kb","overhead":1.0,"tripped":0}}}}}



cpu, thread, memory, disk, merge  정보를 확인할 수 있다. 


python으로 하면 대충 이렇게 개발할 수 있다.


from elasticsearch import Elasticsearch


es = Elasticsearch(host="salmon001.dakao.io", port=9200)

es.nodes.stats(node_id="salmon001")

stat = es.nodes.stats(node_id="salmon001")['nodes']

..




Posted by '김용환'
,



elasticsearch에 좋은 글(function score로 랭킹을 임의 script로 주기, decay 함수) 이 있어서 펌질



출처: https://www.elastic.co/blog/found-function-scoring




curl -XPOST http://localhost:9200/searchtube/_search -d ' { "query": { "function_score": { "query": {"match": {"_all": "severed"}}, "script_score": { "script": "_score * log(doc['likes'].value + doc['views'].value + 1)" } } } }'



{ "query": { "function_score": { "functions": [ { "gauss": { "created_at": { "origin": "2014-04-22T23:50:00", "scale": "12h", "offset": "1h", "decay": 0.3 } } }, { "gauss": { "likes": { "origin": 20000, "scale": 20000 } } }, { "gauss": { "views": { "origin": 20000, "scale": 20000 } } } ] } } }'



gauss 말고, linear, exp가 더 있음..


Posted by '김용환'
,


Elasticsearch의 filter는 query보다 빠르다. filter에 cache를 적용하니 1000배 빨라지는 경우가 있었다. 메모리가 문제 없다면, 단순한 질의를 쓰는 것도 좋을 것 같다. 

참고 : http://knight76.tistory.com/entry/elasticsearch-Query-vs-Filter


query 방식 

curl -XGET http://es.google.com:9200/_search -d '

{

  "query": {

    "bool": {

      "must": [

        {

          "match_all": {}

        },

        {

          "term": {

            "user.id": "5681810"

          }

        }

      ]

    }

  }

}'



이는 filter로 변환하고, cache를 적용해서 빠른 처리가 가능하다. 테스트해보니. filter 처리시간이 query에 비해 1/10로 줄어든 것 같다. (엄청 빨라졌다.)


curl -XGET http://es.google.com:9200/_search -d '

{

  "query": {

    "filtered": { 

      "filter": {

      "term": {

       "user.id": "5681810",

       "_cache": "true"

 }   

      }

    }

  }

}'



필터 캐쉬(filter cache)는 필터 결과를 캐쉬한다.

아래 참조 문서에 따르면, indices.cache.filter.size (디폴트는 10%)만큼 캐쉬한다. 

indices.cache.filter.size는 퍼센트또는 크기로 캐쉬 크기를 결정할 수 있다. 




필터를 쓰고, 필터 캐쉬도 쓴다 해도 어떤 필터를 쓰느냐에 따라 성능 차이가 발생한다. 


Boolean 필터는 도큐먼트의 비트맵 결과를 비트 단위의 불린(Boolean) 연산으로 빠르게 실행할 수 있도록 최적화된 구현체여서 AND/OR/NOT 질의 그룹보다 빠르다.



내가 진행했던 테스트 방법은 curl과 time을 활용했고, 다음과 같다.


for i in {1..100}; do curl -s -w "%{time_namelookup} %{time_connect} %{time_appconnect} %{time_pretransfer} %{time_redirect} %{time_starttransfer} %{time_total}\n" -o /dev/null -XPOST 'http://es.google.com/_search?routing=123124&pretty' -d '

{

  "size": 50,

  "query": {

    "bool": {

      "must": [{

        "term": {

          "user.id": 123124

        }

      }, {

        "term": {

          "user.type": "develop"

        }

      }]

    }

  },

  "from": 0

}

'

; end




 for i in {1..100}; do curl -s -w "%{time_namelookup} %{time_connect} %{time_appconnect} %{time_pretransfer} %{time_redirect} %{time_starttransfer} %{time_total}\n" -o /dev/null -XPOST 'http://es.google.com/_search?routing=123124&pretty' -d '

{

  "size": 50,

  "query": {

    "filtered": {

      "filter": {

        "bool": {

          "must": [

            { "term": { "user.id": 123124 } },

            { "term": { "user.type": "develop" } }

          ]

        }

      },

      "_cache": true

    }

  },

  "from": 0

}

' ; done


테스트해봤을 때, Boolean filter가 Boolean query보다 100ms 정도  빨랐다.






참조 :

https://www.elastic.co/guide/en/elasticsearch/guide/current/filter-caching.html

https://www.elastic.co/guide/en/elasticsearch/reference/1.4/index-modules-cache.html#filter


Posted by '김용환'
,


elasticsearch의 scroll는 자주 바뀌는 데이터에 따라 질의 결과가 달라지는 것을 방지하고 검색 시간 기준으로의 데이터를검색하는데 사용한다. 대신 단점은 메모리에 정보가 저장된다. search context 라는 정보로 메모리에 저장된다. 따라서 이 정보가 계속 남아 있게 되면 메모리 부족을 일으킬 수 있다. 그래서 이런 정보가 너무 메모리에 남아 있지 않게 하기 위해 scroll timeout을 줄 수 있다. 


타임아웃을 주려면 scroll 매개변수를 사용한다.



curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '

{

    "query": {

        "match" : {

            "title" : "elasticsearch"

        }

    }

}

'



자바 native 클라이언트를 사용하려면 Search 문자열을 생성할 때, setScroll() 메소드를 호출한다.


client.prepareSearch(index).setTypes(type)

    .setQuery(query).setScroll(TimeValue.timeValueMinutes(2))




scroll을 써보고 나니, 나름대로의 노하우가 생겼다.


1. scroll timeout은 일래스틱 서치가 결과를 저장하는 시간인데, 이게 10분을 설정한다고 해서 10분동안 생명을 가진 게 아니다. 3시간도 가질 수 있다. 내부적으로 관리되는 형태이다. 

2. scroll timeout을 너무 짧게 설정해서 timeout exception이 발생할 수 있다. 그러면 클라이언트도 문제가 생긴다. runtime exception(내 기억에는...) 이라 잘 살펴봐야 한다. scroll timeout 값은 테스트를 통해서 발견할 수 있는 값 같았다.

3. 최대한 scroll은 같은 시간대에 많이 쓰지 않는 것이 좋다. 데이터와 scroll id에 비례해 메모리를 많이 점유할 수 있다. 따라서 배치 처리하듯 일래스틱서치에 scroll을 적게 사용하는 것이 좋다.

4. scroll은 페이지 처리를 못한다. 즉, start point가 없다. 그래서 계속 크기별로 계속 읽는 것밖에 없다.

5. scroll 사용시 많은 작업을 동시에 하지 않도록 한다. timeout이 발생할 수 있게 때문에, 많은 작업을 필요할 때에는 scroll 바깥에서 작업하도록 한다.

즉, 아래와 같이 scroll을 돌리면서 json을 dump하는 코드일 때, dump json() 함수가 너무 많이 시간이 걸리지 않는다면 써도 된다. (이것도 테스트가 필요하다..)


while (scroll != end) {

  scroll_data = get scroll

  dump json(scroll_data)

}


그러나 

scroll_list;

while (scroll != end) {

  scroll_data = get scroll

 add(scroll_list, scroll_data)

}


dump json(scroll_list)



6. scroll 작업시 언제든지 scroll timeout 에러 뿐 아니라 다양한 에러가 발생할 수 있으니, scroll에서 어떤 에러가 발생하는지 계속 살펴보는 것이 좋다. 




Posted by '김용환'
,



bulk api 호출시 elasticsearch의 HTTP content length 길이 제한으로 인해 에러가 발생할 수 있다.

netty 제한 크기 때문에 그렇다. 


[WARN ][http.netty               ] [node_1] Caught exception while handling client http traffic, closing connection [id: 0x7160d228, /1.1.1.1:56212 => /1.2.2.2:9200]

org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: HTTP content length exceeded 104857600 bytes.

at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:169)




클라이언트에서는 그냥 아무 것도 리턴받지 않고 에러가 발생하기 때문에 알 수가 없다.


문제를 해결하려면 elasticsearch.yml 의 http 설정이 주석을 변경한다.


원래는 아래와 같은 설정이 있는데. 

# Set a custom allowed content length:

#

#http.max_content_length: 100mb


다음과 같이 수정할 수 있다.  주석만 풀어도 주석에 있는 값이 디폴트라서 문제가 된다.


# Set a custom allowed content length:

#

http.max_content_length: 200mb



Posted by '김용환'
,



es 1.x 사용시, prefix filter에 array를 지원하지 않는다. 


문서에 있는 것처럼 한 필드를 기준으로만 쓸 수 있다.

https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-prefix-filter.html#_caching_14


{ "constant_score" : { "filter" : { "prefix" : { "user" : "ki" } } } }


간단한 prefix filter 자바 코딩을 하려면 다음과 같다.



public List<Citizen> getAddressFromPrefix(List<String> prefixs) {

List<Citizen> citizens = Lists.newArrayList();

prefixs.forEach(prefix -> {

FilterBuilder filter = prefixFilter(
"address",
prefix
);

SearchResponse response = searchClient.prepareSearch("citizen").setTypes("address")
.setPostFilter(filter)
.setSize(Integer.MAX_VALUE)
.execute()
.actionGet();

SearchHits searchHits = response.getHits();
for (SearchHit searchHit : searchHits.getHits()) {
Map<String, Object> map = searchHit.getSource();
Citizen citizen = new Citizen();
citizen.id = searchHit.getId();

citizen.address = (String) map.get("address");
citizens.add(citizen);
}

//citizens.forEach(citizen -> System.out.println(citizen.id + ":" + citizen.address));
}
);

return citizens;
}




참고 자료

https://www.elastic.co/guide/en/elasticsearch/client/java-api/1.7/search.html

https://www.elastic.co/guide/en/elasticsearch/client/java-api/1.7/query-dsl-filters.html

https://www.elastic.co/guide/en/elasticsearch/client/java-api/1.7/prefix-filter.html







Posted by '김용환'
,


elasticsearch를 사용하면서 full gc에서 헤어나지 못하는 상황을 해결한 경우에 발생할 수 있다.


elasticsearch에 대용량 색인 하나를 삭제하고, 대용량 색인을 벌크 연산으로 데이터를 추가하는 상황에서

메모리가 부족하면 JVM이 hang에 빠질 수 있다.


elasticsearch 데이터 삭제는 바로 모든 데이터를 삭제하지 않는 Async로 되어 있기 때문에, rest 또는 admin 삭제 요청에 대한 답을 바로 해주도록 되어 있다. 따라서 메모리가 너무 부족하면 적당한 sleep을 주는 것이 좋으며, elasticsearch에 메모리를 많이 올려서 실행시키는 것이 좋다. 

Posted by '김용환'
,


elasticsearch에는 warmer api가 있다. 1.4.0 beta1 부터 만들어진 기능이다.


https://www.elastic.co/guide/en/elasticsearch/reference/1.4/indices-warmers.html


elasticsearch가 java 애플리케이션이라 데이터를 메모리에 로딩해야 속도가 빠르기 때문에,

warming up 또는 preload를 통해서 파일 또는 세그먼트 데이터를 메모리에 올려 놓는 것을 의미한다.


특히 큰 데이터의 경우는 반드시 warming up을 시켜야 한다. 그래서 API로 보면, 색인/템플릿 메모리 로딩 또는 특정 색인 검색시를 위한 메모리 로딩을 지시한다. 


워밍업도 GET/PUT/DELETE 기능이 있다.

PUT _warmer/{warmer_name}

PUT /{index}/_warmer/{warmer_name}

PUT /{index}/{type}/_warmer/{warmer_name}


DELETE /{index}/_warmer/{name}


GET {index}/_warmer/{warmer_name}



elasticsearch 2.0 소스를 간단히 살펴보았다. 생각한 것과 비슷하게 메모리 로딩이다. 


https://github.com/elastic/elasticsearch/blob/2.0/core/src/main/java/org/elasticsearch/indices/IndicesWarmer.java

https://github.com/elastic/elasticsearch/blob/2.0/core/src/main/java/org/elasticsearch/index/warmer/ShardIndexWarmerService.java

https://github.com/elastic/elasticsearch/blob/2.0/core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

https://github.com/elastic/elasticsearch/blob/2.0/core/src/main/java/org/elasticsearch/search/SearchService.java

https://github.com/elastic/elasticsearch/blob/2.0/core/src/main/java/org/elasticsearch/index/cache/bitset/BitsetFilterCache.java


예를 들어, SearchService클래스 초기화시 indicesWarmer에 listener에 NormsWarmer, FieldDataWarmer, SearchWamer 클래스를 추가한다. 그리고 내부적으로 index.warmer.enabled 이 true이면 listener를 작동하도록 코드화 되어 있다. 


     this.indicesWarmer.addListener(new NormsWarmer());

        this.indicesWarmer.addListener(new FieldDataWarmer());

        this.indicesWarmer.addListener(new SearchWarmer());



FileldDataWamer의 경우, 메모리로 색인의 필드 매핑 정보를 로드한다. 


 warmUp.put(indexName, fieldMapper.fieldType());


  for (final MappedFieldType fieldType : warmUp.values()) {

                    executor.execute(new Runnable() {


                        @Override

                        public void run() {

                            try {

                                final long start = System.nanoTime();

                                indexFieldDataService.getForField(fieldType).load(ctx);

...

}



그리고 ShardIndexWarmerService 는 warm관련 간단한 통계 지표를 쌓는다. 


Posted by '김용환'
,



일래스틱서치 검색시, 개수만 구하고 내용은 보고 싶지 않을 때가 있다. 이 때는 size 매개변수 값을 0으로 설정하면 쉽게 결과를 볼 수 있다. size 매개변수를 전달하지 않으면 기본 값인 10이 전달되어 결과 값을 다 봐야 한다. 속도를 조금이나, 구구절절한 내용을 안보려면 size 매개변수를 활용하는 것도 좋을 것 같다. 



curl -XGET http://search.google.com:9200/index_name/type_name/_search?pretty=true\&size=0 -d '{

  "query": {

    "filtered": {

      "query": {

        "match_all": {}

      },

      "filter": {

        "range": {

          "@timestamp": {

            "from": "now-31d/d",

            "to": "now-1d/d"

          }

        }

      }

    }

  }

}'

<결과>

{

  "took" : 5,

  "timed_out" : false,

  "_shards" : {

    "total" : 20,

    "successful" : 20,

    "failed" : 0

  },

  "hits" : {

    "total" : 184821,

    "max_score" : 0.0,

    "hits" : [ ]

  }

}






Posted by '김용환'
,


정식 elasticsearch 2.0이 10월 28일에 드디어 출시되었다. 엄청 빠른 relase에 감탄하고 있다.


http://mvnrepository.com/artifact/org.elasticsearch/elasticsearch/2.0.0



https://www.elastic.co/blog/elasticsearch-2-0-0-released



(참고로 elsticsearch 3.0도 진행중이다...소스나 issues보면 장난 아님..)


엄청 좋아지고 좋은 플러그인도 있다.. 자세한 내용은 아래에. 있다. 

Elasticsearch 2.0.0 GA released

With 2,799 pull requests by 477 committers added since the release of Elasticsearch 1.0.0, we are proud to announce the release of Elasticsearch 2.0.0 GA, based on Lucene 5.2.1.

As if that were not enough, we are also releasing version 2.0.0 of the Shield security and Watcher alerting plugins, an all new streamlined Marvel monitoring plugin which is now free to use in production, and a new open source Sense editor.

You can download Elasticsearch 2.0.0 and read about the important breaking changes in 2.0.0 here. The full changes list can be found here:

Change logs for the commercial plugins can be found here:

New in Elasticsearch

Elasticsearch 2.0.0 delivers awesome new features, such as:

Pipeline Aggregations

The ability to run aggregations such as derivatives, moving averages, and series arithmetic on the results of other aggregations. This functionality was always doable client-side, but pushing the computation into Elasticsearch makes it easier to build more powerful analytic queries, while simplifying client code considerably. It opens up the potential for predictive analytics and anomaly detection. You can read more about Pipeline Aggregations in:

Query/Filter merging

Filters are no more. All filters clauses have now become query clauses instead. When used in query context, they have an effect on relevance scoring and, when used in filter context, they simply exclude documents which don’t match, just like filters do today. This restructuring means that query execution can be automatically optimized to run in the most efficient order possible. For instance, slow queries like phrase and geo queries first execute a fast approximate phase, then trim the results with a slower exact phase. In filter context, frequently used clauses will be cached automatically whenever it makes sense to do so. You can read more in “Better query execution coming to Elasticsearch 2.0”.

Configurable store compression

Stored fields like the _source field can be compressed either with LZ4 for speed (default), or with DEFLATE for reduced index size. This is particularly useful for logging, where old indices can switch tobest_compression before being optimized. You can read more in “Store compression in Lucene and Elasticsearch”.

Hardening

Elasticsearch now runs under the Java Security Manager, which marks a huge leap forward in terms of security. The Security Manager makes Elasticsearch harder to exploit and severely restricts the impact that any hacker could have on your system. Elasticsearch has also been hardened from an indexing perspective:

  • Documents are now fsynced to disk before indexing requests are acknowledged making writes durable-by-default.
  • All files are checksummed to detect corruption early.
  • All file renames are atomic to prevent any partially written files.

Finally, a much requested change from system administrators to prevent an unconfigured node from joining a public network: Elasticsearch now binds to localhost only by default, and multicast has been removed in favour of unicast.

Performance and resilience

Besides the above, there are a multitude of smaller changes both in Elasticsearch and Lucene that add up to a more stable, reliable, easy-to-configure system, for example:

  • Lower heap usage with doc-values-by-default, reduced memory usage during merges, and roaring bitsets for filter caching.
  • Structured, readable, exceptions.
  • More reliance on feedback loops instead of settings for auto-regulation.
  • big cleanup to type mappings to make them safe, unambiguous, and reliable.
  • Cluster state diffs for faster change propagation and more stable large clusters.
  • Improved compression of norms, previously a big user of heap space.
  • Auto-throttling of merges, without needing to tweak arcane settings.
  • More fine-grained Lucene memory reporting.
  • Parent/child rewritten to take advantage of optimal query execution.
Core plugins

The officially supported core plugins now ship at the same time and with the same version number as Elasticsearch core. No longer will you need to look at a complicated version matrix to figure out which plugin version to install. Instead, the core plugins can be installed as follows:

bin/plugin install analysis-icu

New in Shield and Watcher

Our commercial plugins ship with some cool new features, like:

Shield

  • Field- and Document-level access control.
  • User impersonation.
  • Custom extendable authentication realms.
Watcher

  • Activate/Deactivate individual watches.
  • Notifications in Slack and Hipchat.

You can read more about these features in “Shield, Watcher, and Marvel 2.0.0 GA Released”.

Like the core open source plugins, our commercial plugins are now released at the same time and with the same version number as Elasticsearch core, and they can be installed as follows:

bin/plugin install license
bin/plugin install shield
bin/plugin install watcher

Marvel 2.0.0 free to use in production

The Marvel monitoring plugin has been invaluable to our customers, helping them both to diagnose problems after the fact and to spot issues while they are evolving. We have taken a good hard look at what can be improved and have rewritten Marvel from scratch:

  • The Marvel UI is now built on top of the all new Kibana platform.
  • Dashboards have been streamlined to show the most important metrics, making problems easier to spot.
  • Marvel now supports monitoring of multiple clusters from a single installation, as a commercial feature.

And the best part - Marvel is now free to use in production for all Elasticsearch users. A license is required, but is available to all users free of charge. If you require multi-cluster monitoring support, that is a commercial feature.

You can read more about Marvel in "Shield, Watcher, and Marvel 2.0.0 GA Released".

Open source Sense editor

Sense, the browser-based Elasticsearch request and DSL editor, is now available to all as an open source app built on top of the Kibana platform. This new release adds some great new features:

  • Paste multiple cURL requests to convert to Sense syntax.
  • Copy multiple Sense requests in cURL syntax.
  • Execute multiple requests in one go.
  • Autocompletion database updated to support Elasticsearch 2.0.

Sense can be installed as a Kibana app as:

./bin/kibana plugin --install elastic/sense

You can read more about Sense in "The Story of Sense - Announcing Sense 2.0.0-beta1".

Elasticsearch Migration Plugin

The Elasticsearch Migration Plugin is the best starting point if you are upgrading from Elasticsearch 1.x to 2.0. It installs as a site plugin in any 1.x Elasticsearch cluster, and will detect issues that need to be resolved before upgrading such as ancient Lucene 3 indices and problematic mappings (see “The Great Mapping Refactoring”) that will no longer work in Elasticsearch 2.0.0.

You can find the instructions for this plugin in the Elasticsearch Migration repository.

Conclusion

Please download Elasticsearch 2.0.0, try it out, and let us know what you think on Twitter (@elastic) or in our forum. You can report any problems on the GitHub issues page.


Posted by '김용환'
,