* 생각.

6월부터 계속 뉴스가 나오거나 발표자료가 나왔던거 오픈소스준비를 하겠다고 말을 했었다. 11월 7일 오픈소스로 공유되었다. 


역시 사람의 머리로 사용하기에는 2차원 테이블의 SQL이 가장 낫고, 이에 맞게 툴이 개발되는게 맞는 거 같다.

그동안 dw 상용버전 혹은 opensource(hadoop, impala, Hawk같은 분석툴)을 고쳐서 내부적으로 사용(Vendor Customization) 했었는데, 점점 의존적이 될지는 의문이긴 하지만, 속도가 이게 훨씬 잘 나온다면, 안쓸리가 없지.. Hive도 너무 느려서 사실 개인적으로 쓰고 싶지 않은 느낌이었는데, 역시. 8~10배 빠른 fresto를 쓰고 싶어졌다...


---



* Fresto 

- Apache 2 Llicense

- Presto has grown to have 850 internal users per day performing 27,000 queries and fiddling with 320TB of data. existing data warehouse is 250PB in size, and growing rapidly: 600TB is added to the warehouse every day. 

- 3 regional cluster, successfully scaled to 1000 nodes

- stored in the Hadoop Distributed File System, so although some may question why Facebook doesn't just use a SQL DB engine for its queries, the reason is that it needs to have as few layers of abstraction between it and the underlying HDFS data. For that reason, creating add-ons that inteface directly with HDFS, such as Presto, is better for performance than abstracting away.

- 4-7x more cpu-efficient than Hive, 8-10x faster than Hive

- one-time/continuous import from external system



http://www.techsuda.com/archives/2224

https://github.com/facebook/presto

https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920



동영상

https://www.facebook.com/photo.php?v=10202463462128185

Posted by '김용환'
,