[elasticsearch] optimize api

Elasticsearch 2015. 8. 26. 15:14

optimize API는 하나 이상의 색인(멀티 지원)을 최적화하는 API로서, 강제로 호출하는 형태이다. 최적화 과정은 일반적으로 빠른 검색 동작을 위해 색인을 최적화한다. optimize API는 완료될 때까지 블럭하며(비동기 아님), http 연결이 손실되면, 해당 요청은 백그라운드로 실행된다.

새로운 optimize 요청이 들어면 최적화 작업이 완료 될 때까지 블럭한다. 만약 검색 요청이나 삭제 요청이 들어오면 일래스틱서치가 받는다.

$ curl -XPOST 'http://localhost:9200/poi/_optimize'

요청 매개변수

* max_num_segments : 최적화할 세그먼트 개수이며, 기본 값은 -1이다. 색인을 충분히 최적화하려면 1로 설정한다. 1로 설정하는 의미는 세그먼트 1개로 검색할 수 있게 (searchable) 하겠다는 의미가 있다.

* only_expunge_deletes : 최적화 작업은 일래스틱서치에서 삭제하면서 생긴 세그먼트를 제거할지 명세한다. 기본값은 false이다. 루씬에서는 다큐먼트는 세그먼트에서 바로 삭제되지 않고 지워짐(deleted)라고 표시하기만 한다.

일반적으로 세그먼트 병합 과정시, 새로운 세그먼트는 지워짐(deleted)이라 표시된 세그먼트에서 생성되지 않는다. 즉 새 것은 새 부대에 담는다. 따라서, 해당 옵션은 지원 세그먼트에서 병합을 할 때 사용할 수 있는 옵션이다.

index.merge.policy.expunge_deletes_allowed를 값을 오버라이드 하지 않는다. 따라서 설정쪽과 달리 최적화 api 에만 적용된다.

* flush : 최적화 작업 후, flush가 수행될지를 알린다. 기본값은 true이다.

세그먼트 병합은 검색 성능을 높이기 위해 다큐먼트를 삭제할 것은 삭제하고 남아있는 다큐먼트를 더 큰 세그먼트로 병합하는 것을 의미한다. 병합 과정이 끝나면 병합된 세그먼트는 더 이상 쓰지 않기 때문에 삭제한다.이러한 병합 작업은 프로세스와 메모리를 많이 사용한다. 따라서 다큐먼트 삭제가 많은 서비스에는 일래스틱서치에 맞지 않을 수 있다.

또한 동적이 아닌 정적으로 동작한다. 그래서, 색인이 백그라운드로 세그먼트 병합 중일 때, 강제로 optimize API를 호출할 때는 좋지 않을 수 있다.

출처

https://www.elastic.co/guide/en/elasticsearch/guide/current/merge-process.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-optimize.html

+ 경험

소스로 좀 더 추가설명하면 해당 optimize 호출을 깊게 파면, InternalEngine.forceMerge()를 호출하도록 되어 있다. 이를 통해 관련 내용을 잘 이해할 수 있다.

https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

@Override
public void forceMerge(final boolean flush, int maxNumSegments, boolean onlyExpungeDeletes,
                       final boolean upgrade, final boolean upgradeOnlyAncientSegments) throws EngineException {
    /*
     * We do NOT acquire the readlock here since we are waiting on the merges to finish
     * that's fine since the IW.rollback should stop all the threads and trigger an IOException
     * causing us to fail the forceMerge
     *
     * The way we implement upgrades is a bit hackish in the sense that we set an instance
     * variable and that this setting will thus apply to the next forced merge that will be run.
     * This is ok because (1) this is the only place we call forceMerge, (2) we have a single
     * thread for optimize, and the 'optimizeLock' guarding this code, and (3) ConcurrentMergeScheduler
     * syncs calls to findForcedMerges.
     */
    assert indexWriter.getConfig().getMergePolicy() instanceof ElasticsearchMergePolicy : "MergePolicy is " + indexWriter.getConfig().getMergePolicy().getClass().getName();
    ElasticsearchMergePolicy mp = (ElasticsearchMergePolicy) indexWriter.getConfig().getMergePolicy();
    optimizeLock.lock();
    try {
        ensureOpen();
        if (upgrade) {
            logger.info("starting segment upgrade upgradeOnlyAncientSegments={}", upgradeOnlyAncientSegments);
            mp.setUpgradeInProgress(true, upgradeOnlyAncientSegments);
        }
        store.incRef(); // increment the ref just to ensure nobody closes the store while we optimize
        try {
            if (onlyExpungeDeletes) {
                assert upgrade == false;
                indexWriter.forceMergeDeletes(true /* blocks and waits for merges*/);
            } else if (maxNumSegments <= 0) {
                assert upgrade == false;
                indexWriter.maybeMerge();
            } else {
                indexWriter.forceMerge(maxNumSegments, true /* blocks and waits for merges*/);
            }
            if (flush) {
                flush(true, true);
            }
            if (upgrade) {
                logger.info("finished segment upgrade");
            }
        } finally {
            store.decRef();
        }
    } catch (Throwable t) {
        ForceMergeFailedEngineException ex = new ForceMergeFailedEngineException(shardId, t);
        maybeFailEngine("force merge", ex);
        throw ex;
    } finally {
        try {
            mp.setUpgradeInProgress(false, false); // reset it just to make sure we reset it in a case of an error
        } finally {
            optimizeLock.unlock();
        }
    }
}

저작자표시 (새창열림)

'Elasticsearch' 카테고리의 다른 글

[elasticsearch] 한 샤드 최대 크기 (0)	2015.09.04
elasticsearch 2.0 출시 (2015.8.26) (0)	2015.08.27
[elaseticsearch] 플러그인 개발 및 python 좋은 자료 (0)	2015.08.25
[elasticsearch] 아키텍처 설계 (0)	2015.08.25
[elasticsearch] 일래스틱서치 프로토콜과 포트 (0)	2015.08.25

Posted by '김용환'

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

[elasticsearch] optimize api

'Elasticsearch' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바