1)
"AND \$CONDITIONS"를 WHERE 끝에 반드시 써야 한다!!! 아 삽질~
sqoop...
--query "SELECT id, name
FROM $db_table
WHERE id >=1 AND \$CONDITIONS " \
....
출처 : https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html
If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop. Your query must include the token $CONDITIONS
which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by
.
<Note>
If you are issuing the query wrapped with double quotes ("), you will have to use \$CONDITIONS
instead of just $CONDITIONS
to disallow your shell from treating it as a shell variable. For example, a double quoted query may look like: "SELECT * FROM x WHERE a='foo' AND \$CONDITIONS"
2)
병렬로 돌리고 싶다면, num-mappers을 사용한다.
--num-mappers $num_mappers
'hadoop' 카테고리의 다른 글
[펌] hadoop streaming 기초 지식 쌓기 (0) | 2016.02.17 |
---|---|
[hadoop] top n 소팅 (0) | 2016.02.16 |
[hadoop] hadoop distcp (0) | 2016.02.05 |
[hadoop] 왜 hadoop2은 hadoop 요청시 mapreduce.Job: map 99% reduce 33%에서 잠깐 블럭되는 걸까? (0) | 2016.02.03 |
[hadoop] hive 결과를 hadoop 파일 시스템에 저장하기 (0) | 2016.01.28 |