'Hadoop' 태그의 글 목록

'Hadoop'에 해당되는 글 3건

2011.08.23 hadoop(하둡) 0.21.0 cygwin 기반에서 설치 (installation)
2011.08.22 [Hadoop 설치 에러] bin/hadoop namenode -format 실패시
2008.05.14 Map Reduce 2

hadoop(하둡) 0.21.0 cygwin 기반에서 설치 (installation)

nosql 2011. 8. 23. 10:51

<여러 버전 테스트시 유의사항>

설치 서버의 /tmp 위치를 dfs 파일시스템 디렉토리로 사용한다. 윈도우에서는 설치 디스크 드라이브를 따라감
만약 cygwin을 d:\에 설치했으면, d:\tmp 에 있다.
각 버전별로 디렉토리를 인식이 잘 안될 수 있으니, namenode 실행시 문제가 생기면 dfs 파일시스템인 /tmp 를 완전히 지우고 시작해야 한다.
bin/hadoop datanode -format 해서 새로 생성되게 하고 나서 bin/hadoop datanode를 실행한다.

* 0.20.2 버전 테스트시
잘됨

* 0.20.203.0 버전 테스트시

bin/hadoop tasktracker실행시 문제 발생.

11/08/22 19:01:37 ERROR mapred.TaskTracker: Can not start task tracker because j
ava.io.IOException: Failed to set permissions of path: /tmp/hadoop-nhn/mapred/lo
cal/ttprivate to 0700

살펴보는게 귀찮아서 패스..

* 0.21.0 버전 테스트시
classpath 에 새로운 정책이 수행되었나 보다. warning은 뜨지만, 기존처럼 계속 사용할 수 있다.

<설치 참고 싸이트>

http://hadoop.apache.org/common/docs/stable/single_node_setup.html

http://cardia.tistory.com/entry/Hadoop-0202-%EC%84%A4%EC%B9%98-%EB%B0%8F-%ED%99%98%EA%B2%BD%EC%84%A4%EC%A0%95

http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html

http://developer.yahoo.com/hadoop/tutorial/module3.html

<설치 순서>
1. cygwin 설치. open ssh 연결 (cygwin을 워낙 잘쓰는 편이라서 자세한 내용 생략)
2. cygwin 에서 hadoop 다운로드(http://www.apache.org/dyn/closer.cgi/hadoop/common/)
3. hadoop 0.21.0 또는 0.20.2 버전 다운로드해서 "cygwin 설치디렉토리/home/계정명/"에 설치
4. cygwin 접속
5. cygwin에서 ssh 설정

(중간 중간에 yes / no 설정 잘하기)

$ ssh-host-config
*** Info: Generating /etc/ssh_host_key
*** Info: Generating /etc/ssh_host_rsa_key
*** Info: Generating /etc/ssh_host_dsa_key
*** Info: Generating /etc/ssh_host_ecdsa_key
*** Info: Creating default /etc/ssh_config file
*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/READ
ME.privsep.
*** Query: Should privilege separation be used? (yes/no) no
*** Info: Updating /etc/sshd_config file

*** Warning: The following functions require administrator privileges!

*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: [] yes
*** Info: On Windows Server 2003, Windows Vista, and above, the
*** Info: SYSTEM account cannot setuid to other users -- a capability
*** Info: sshd requires. You need to have or to create a privileged
*** Info: account. This script will help you do so.

*** Info: You appear to be running Windows XP 64bit, Windows 2003 Server,
*** Info: or later. On these systems, it's not possible to use the LocalSystem
*** Info: account for services that can change the user id without an
*** Info: explicit password (such as passwordless logins [e.g. public key
*** Info: authentication] via sshd).

*** Info: If you want to enable that functionality, it's required to create
*** Info: a new account with special privileges (unless a similar account
*** Info: already exists). This account is then used to run these special
*** Info: servers.

*** Info: Note that creating a new user requires that the current account
*** Info: have Administrator privileges itself.

*** Info: No privileged account could be found.

*** Info: This script plans to use 'cyg_server'.
*** Info: 'cyg_server' will only be used by registered services.
*** Query: Do you want to use a different name? (yes/no) yes
*** Query: Enter the new user name: ntsec
*** Query: Reenter: ntsec

*** Query: Create new privileged user account 'ntsec'? (yes/no) no
*** ERROR: There was a serious problem creating a privileged user.
*** Query: Do you want to proceed anyway? (yes/no) yes
*** Warning: Expected privileged user 'ntsec' does not exist.
*** Warning: Defaulting to 'SYSTEM'

*** Info: The sshd service has been installed under the LocalSystem
*** Info: account (also known as SYSTEM). To start the service now, call
*** Info: `net start sshd' or `cygrunsrv -S sshd'. Otherwise, it
*** Info: will start automatically after the next reboot.

*** Info: Host configuration finished. Have fun!

데몬 실행하고, 키생성 하고, 테스트
$ net start sshd
CYGWIN sshd 서비스를 시작합니다..
CYGWIN sshd 서비스가 잘 시작되었습니다.

$ ssh-keygen

$ cd ~/.ssh

$ cat id_rsa.pub >> authorized_keys

$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is 71:6d:7b:51:aa:8a:2f:c1:c2:30:44:d1:b7:e0:f8:0e.
Are you sure you want to continue connecting (yes/no)?

$ ssh localhost
nhn@localhost's password:

$ logout
Connection to localhost closed.

데몬으로 실행되는지 확인 (제어판-관리도구-서비스 보면, cygwin sshd 라고 서비스가 데몬으로 실행중인지 볼 수 있음)

6. jdk 위치 수정
conf/hadoop-env.sh 파일
export JAVA_HOME="/cygdrive/c/Progra~1/Java/jdk1.6.0_24

7. 하둡 설정 파일 수정 (conf 디렉토리 밑)

conf/core-site.xml:

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

conf/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

conf/mapred-site.xml:

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
    </property>
</configuration>

8. 데몬 실행
cygwin 창을 띄우고, 하나씩 실행

$ bin/hadoop namenode
$ bin/hadoop secondarynamenode
$ bin/hadoop jobtracker
$ bin/hadoop datanode
$ bin/hadoop tasktracker

9. 테스트 실행

$ bin/hadoop dfs -put conf input

$ bin/hadoop jar hadoop-mapred-examples-0.21.0.jar grep input output 'dfs[a-z]+'

그리고, 어드민포트에 접근해서 정상적으로 작동되는지 확인한다.

0.21 버전에서는 아래와 같이 출력된다.
http://localhost:50070/dfshealth.jsp

http://localhost:50030/jobtracker.jsp

저작자표시

'nosql' 카테고리의 다른 글

[hbase] Exception in thread "main" org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'hbaseConfiguration': Initialization of bean failed; nested exception is java.lang.NoSuchFieldError: NULL 발생 (0)	2013.04.04
Hbase mac install - standalone (mac에 hbase mac 설치) (0)	2013.04.04
[Hadoop 설치 에러] bin/hadoop namenode -format 실패시 (0)	2011.08.22
Redis 소개 (2)	2011.08.18
Memcached와 Redis 성능 테스트 비교 자료 (2)	2011.08.17

Posted by '김용환'

[Hadoop 설치 에러] bin/hadoop namenode -format 실패시

nosql 2011. 8. 22. 18:21

bin/hadoop namenode -format 시에 자꾸 에러가 발생했다.

원인은 conf/core-site.xml 파일에
아래 내용을 추가하지 않았다.

$ cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>

잘 된 모습 확인

$ bin/hadoop namenode -format
11/08/22 18:08:58 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = nhn-PC/10.64.49.213
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.203.0
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/br
anch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT
2011
************************************************************/
Re-format filesystem in \tmp\hadoop-nhn\dfs\name ? (Y or N) Y 입력
11/08/22 18:09:00 INFO util.GSet: VM type       = 32-bit
11/08/22 18:09:00 INFO util.GSet: 2% max memory = 19.33375 MB
11/08/22 18:09:00 INFO util.GSet: capacity      = 2^22 = 4194304 entries
11/08/22 18:09:00 INFO util.GSet: recommended=4194304, actual=4194304
11/08/22 18:09:00 INFO namenode.FSNamesystem: fsOwner=nhn
11/08/22 18:09:00 INFO namenode.FSNamesystem: supergroup=supergroup
11/08/22 18:09:00 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/08/22 18:09:00 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
11/08/22 18:09:00 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessK
eyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
11/08/22 18:09:00 INFO namenode.NameNode: Caching file names occuring more than
10 times
11/08/22 18:09:00 INFO common.Storage: Image file of size 109 saved in 0 seconds
.
11/08/22 18:09:01 INFO common.Storage: Storage directory \tmp\hadoop-nhn\dfs\nam
e has been successfully formatted.
11/08/22 18:09:01 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nhn-PC/10.64.49.213
************************************************************/

저작자표시

'nosql' 카테고리의 다른 글

Hbase mac install - standalone (mac에 hbase mac 설치) (0)	2013.04.04
hadoop(하둡) 0.21.0 cygwin 기반에서 설치 (installation) (0)	2011.08.23
Redis 소개 (2)	2011.08.18
Memcached와 Redis 성능 테스트 비교 자료 (2)	2011.08.17
데이터 모델에 따른 Nosql 선택과 고려사항 (0)	2011.06.24

Posted by '김용환'

Map Reduce

Architecture 2008. 5. 14. 21:25

구글의 검색 엔진의 기본 핵심이다.

functional langaue인 lisp에 영향을 받은것으로 보이며, 간단 명료하게 병렬 배취기능형태로 제공된 것으로 보인다.

홈페이지 및 출처는 다음과 같다.

http://labs.google.com/papers/mapreduce.html

MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat

아주 간단한 input/output 프로그래밍 api를 가지고 있다.

map (in_key, in_value) -> list(out_key, intermediate_value)

reduce (out_key, list(intermediate_value)) -> list(out_value)

이런식으로 구글 소스 트리에 사용되는데. 계속 사용되고 있다..

실행에 대한 개념이다. 중간값이 만들어지고, 그룹된 Key에 의해서 그룹되어서 결과를 나오게 한다.

이를 이용해서 병렬처리는 다음과 하게 되어 있다.

좀더 쉽게 설명되면 다음과 같다. (출처 : http://www.joinc.co.kr/modules/moniwiki/wiki.php/JCvs/Search/Document/ManReduce?action=download&value=mapreduce.png)

자연스럽게 시간에 대한 pipe개념도 추가되어 있다.

또한 재실행하는 문제에 대한 문제까지도 고려하고 있다.

기본 데이터가 없어서 성능쪽은 비교대상이 없기 때문에 쉽게 말하기는 어려운 것 같다.

'Architecture' 카테고리의 다른 글

모니터링 솔루션 개발 완료 (0)	2009.08.06
티스토리 시스템 장애 관련 (0)	2009.04.17
네트워크 프로그래밍시 유의사항 (0)	2006.07.20
<img src="http://blogimgs.naver.com/nblog/ico_scrap01.gif" class="i_scrap" width="50" height="15" alt="본문스크랩" /> WSDL 상세 (0)	2006.05.12
공짜 라이브러리 (0)	2006.01.23

Posted by '김용환'

이전 1 다음

'Hadoop'에 해당되는 글 3건

hadoop(하둡) 0.21.0 cygwin 기반에서 설치 (installation)

'nosql' 카테고리의 다른 글

[Hadoop 설치 에러] bin/hadoop namenode -format 실패시

'nosql' 카테고리의 다른 글

Map Reduce

'Architecture' 카테고리의 다른 글

카테고리

태그목록

최근에 올라온 글

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

달력

링크

티스토리툴바