[hadoop] scoop 쓸 때 유의사항
1)
"AND \$CONDITIONS"를 WHERE 끝에 반드시 써야 한다!!! 아 삽질~
sqoop...
--query "SELECT id, name
FROM $db_table
WHERE id >=1 AND \$CONDITIONS " \
....
출처 : https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html
If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop. Your query must include the token $CONDITIONS
which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by
.
<Note>
If you are issuing the query wrapped with double quotes ("), you will have to use \$CONDITIONS
instead of just $CONDITIONS
to disallow your shell from treating it as a shell variable. For example, a double quoted query may look like: "SELECT * FROM x WHERE a='foo' AND \$CONDITIONS"
2)
병렬로 돌리고 싶다면, num-mappers을 사용한다.
--num-mappers $num_mappers