1. 서론


hbase 에 java VM agent를 추가하여 region 서버 프로세스의 메모리 address를 lock할 수 있도록 기능을 추가할 예정이다. 이를 mlock_all이라 불렀다. 

상당히 실험적인 코드이며, 


2013년 4월 8일 현재 0.94.x 버전이 stable한 버전에는 포함되어 있지는 않지만, 

차기버전인 0.95 (아님 0.98부터??) 부터는 mlockall을 사용할 예정인데, 잠깐 훝어본다.


관련 자료


https://issues.apache.org/jira/browse/HBASE-4391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel





2. mlockall 의미


user app에서 할당한 메모리영역을 RAM에 계속 있도록 개런티해주는 c함수이다. 


http://linux.die.net/man/2/mlockall


mlockall() locks all pages mapped into the address space of the calling process. This includes the pages of the code, data and stack segment, as well as shared libraries, user space kernel data, shared memory, and memory-mapped files. All mapped pages are guaranteed to be resident in RAM when the call returns successfully; the pages are guaranteed to stay in RAM until later unlocked.






3. 관련 내용 


(1) 배경


주어진 물리 메모리보다 많은 job들을 하다보면 Swapping이 일어나는데, 이 때문에 Region Server 이 문제가 되거나 Zookeeper의 connection을 잃어버리기도 한다. swapping없이 메모리를 RAM에서만 쓰도록 하게 할 수 있도록 mlockall을 쓰게 한다. 

 



(2) 데몬 실행 방법

hbase 소스를 다운받아 -Pnative 을 추가한 pom build시 libmlockall_agent.so 파일이 생성된다. 


옵션에 agentpath에 해당 모듈과 = 단위로 parsing할 수 있도록 정보를 넣는다. 

export HBASE_REGIONSERVER_OPTS="-agentpath:./libmlockall_agent.so=user=hbase" 


root권한으로 hbase region server을 실행시킨다.  

hbase --mlock user=hbase regionserver start 


만약 root권한이 아니면 다음과 같은 로그가 발생된다.

Unable to boost memlock resource limit: Operation not permitted



* 설명 

root권한으로 hbase를 실행시키지만  setuid를 hbase user로 낮춘다. 




(3)  구현

JNI에서 JVMTI를 이용했다.


소스.

http://svn.apache.org/viewvc/hbase/trunk/hbase-server/src/main/native/src/mlockall_agent/mlockall_agent.c?view=markup&pathrev=1353289


/*

 * Licensed to the Apache Software Foundation (ASF) under one or more

 * contributor license agreements.  See the NOTICE file distributed with

 * this work for additional information regarding copyright ownership.

 * The ASF licenses this file to You under the Apache License, Version 2.0

 * (the "License"); you may not use this file except in compliance with

 * the License.  You may obtain a copy of the License at

 *

 *     http://www.apache.org/licenses/LICENSE-2.0

 *

 * Unless required by applicable law or agreed to in writing, software

 * distributed under the License is distributed on an "AS IS" BASIS,

 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

 * See the License for the specific language governing permissions and

 * limitations under the License.

 */


/**

 * mlockall_agent is a simple VM Agent that allows to lock the address space of 

 * the process. This avoids the process' memory eviction under pressure.

 *

 * One example is when on the same machine you run the Region Server and 

 * some map-reduce tasks, some unused data in the region server may get swapped 

 * and this affects the region server performance.

 * 

 * You can load the agent by adding it as a jvm option:

 * export HBASE_REGIONSERVER_OPTS="-agentpath:./libmlockall_agent.so=user=hbase"

 */


#include <libgen.h>

#include <grp.h>

#include <pwd.h>

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <sys/mman.h>

#include <sys/resource.h>

#include <sys/time.h>

#include <sys/types.h>

#include <unistd.h>

#include "jvmti.h"


typedef struct opts {

  char *setuid_user;

} opts_t;


#define PREFIX "mlockall_agent: "

#define LOG(fmt, ...) { fprintf(stderr, PREFIX fmt, #__VA_ARGS__); }


static int parse_options (const char *options, opts_t *parsed) {

  char *optr, *opts_dup;

  char *save2 = NULL;

  char *save = NULL;

  char *key, *val;

  int ret = 0;

  char *tok;


  memset(parsed, 0, sizeof(opts_t));

  if ((opts_dup = strdup(options)) == NULL)

    return(-1);


  optr = opts_dup;

  while ((tok = strtok_r(optr, ",", &save)) != NULL) {

    optr = NULL;

    save2 = NULL;


    key = strtok_r(tok, "=", &save2);    

    val = strtok_r(NULL, "=", &save2);

    if (!strcmp(key, "user")) {

      parsed->setuid_user = strdup(val);

    } else {

      LOG("Unknown agent parameter '%s'\n", key);

      ret = 1;

    }

  }


  free(opts_dup);

  return(ret);

}


static void warn_unless_root() {

  if (geteuid() != 0) {

    LOG("(this may be because java was not run as root!)\n");

  }

}


JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *vm, char *init_str, void *reserved) {

  struct passwd *pwd = NULL;

  opts_t opts;


  if (parse_options(init_str, &opts)) {

    return(1);

  }


  // Check that the target user for setuid is specified if current user is root

  if (opts.setuid_user == NULL) {

    LOG("Unable to setuid: specify a target username as the agent option user=<username>\n");

    return(1);

  }


  // Check that this user exists

  if ((pwd = getpwnam(opts.setuid_user)) == NULL) {

    LOG("Unable to setuid: could not find user '%s'\n", opts.setuid_user);

    return(1);

  }


  // Boost the mlock limit up to infinity

  struct rlimit lim;

  lim.rlim_max = RLIM_INFINITY;

  lim.rlim_cur = lim.rlim_max;

  if (setrlimit(RLIMIT_MEMLOCK, &lim)) {

    perror(PREFIX "Unable to boost memlock resource limit");

    warn_unless_root();

    return(1);

  }


  // Actually lock our memory, including future allocations.

  if (mlockall(MCL_CURRENT | MCL_FUTURE)) {

    perror(PREFIX "Unable to lock memory.");

    warn_unless_root();

    return(1);

  }


  // Drop down to the user's supplemental group list

  if (initgroups(opts.setuid_user, pwd->pw_gid)) {

    perror(PREFIX "Unable to initgroups");

    warn_unless_root();

    return(1);

  }

 

  // And primary group ID

  if (setgid(pwd->pw_gid)) {

    perror(PREFIX "Unable to setgid");

    warn_unless_root();

    return(1);

  }


  // And user ID

  if (setuid(pwd->pw_uid)) {

    perror(PREFIX "Unable to setuid");

    warn_unless_root();

    return(1);

  }


  LOG("Successfully locked memory and setuid to %s\n", opts.setuid_user);

  return(0);

}





4. 참고

elastic search에도 비슷한 내용이 들어가 있다.


http://www.elasticsearch.org/guide/reference/setup/installation/



Memory Settings

There is an option to use mlockall to try and lock the process address space so it won’t be swapped. For this to work, the bootstrap.mlockall should be set to true and it is recommended to set both the min and max memory allocation to be the same.

In order to see if this works or not, set the common.jna logging to DEBUG level. A solution to “Unknown mlockall error 0” can be to set ulimit -l unlimited.

Note, this is experimental feature, and might cause the JVM or shell session to exit if failing to allocate the memory (because not enough memory is available on the machine).





Posted by '김용환'
,